static replication strategies for content availability in...

21
Mobile Netw Appl DOI 10.1007/s11036-008-0120-y Static Replication Strategies for Content Availability in Vehicular Ad-hoc Networks Shyam Kapadia · Bhaskar Krishnamachari · Shahram Ghandeharizadeh © Springer Science + Business Media, LLC 2008 Abstract This study investigates replication strategies for reducing latency to desired content in a ve- hicular peer-to-peer network. We provide a general constrained optimization formulation for efficient repli- cation and study it via analysis and simulations em- ploying a discrete random walk mobility model for the vehicles. Our solution space comprises of a family of popularity based replication schemes each character- ized by an exponent n. We find that the optimal repli- cation exponent depends significantly on factors such as the total system storage, data item size, and vehicle trip duration. With small data items and long client trip durations, n 0.5 i.e., a square-root replication scheme provides the lowest aggregate latency across all data item requests. However, for short trip durations, n moves toward 1, making a linear replication scheme more favorable. For larger data items and long client trip durations, we find that the optimal replication exponent is below 0.5. Finally, for these larger data items, if the client trip duration is short, the optimal replication exponent is found to be a function of the total storage in the system. Subsequently, the above observations are validated with two real data sets: one S. Kapadia (B ) · B. Krishnamachari · S. Ghandeharizadeh Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA e-mail: [email protected] S. Ghandeharizadeh e-mail: [email protected] B. Krishnamachari Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089, USA e-mail: [email protected] based on a city map with freeway traffic information and the other employing encounter traces from a bus network. Keywords data replication · availability · vehicular ad-hoc networks · optimization 1 Introduction Advances in computer processing, data storage, and wireless communications have made it feasible to en- vision on-demand delivery of content such as audio and video clips between mobile vehicles. The content exchanged between the vehicles may vary from traffic information such as accident notifications and emer- gency vehicle arrival notifications to multimedia for entertainment such as audio files, cartoons, movies and other video files. A vehicle is equipped with an AutoMata (formerly known as a C2P2 for Car-to-Car Peer-to-Peer [10]) device consisting of several gigabytes of storage, a fast processor and a short-range wireless interface with bandwidths of several tens of Mbps [2]. The AutoMata-equipped vehicles, forming an intermit- tently connected network [23, 26], collaborate to realize an application for on-demand delivery of entertain- ment content. In such a system, when a client vehicle issues a re- quest for desired content not found in its local storage, this request can be satisfied only when it is in the vicinity of another vehicle that carries the requested item. Therefore, a key metric is availability latency, defined as the time between request issuance and re- quest satisfaction. Clearly, the more the number of vehicles carrying a certain data item, the lower will be

Upload: others

Post on 30-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Mobile Netw ApplDOI 10.1007/s11036-008-0120-y

    Static Replication Strategies for Content Availabilityin Vehicular Ad-hoc Networks

    Shyam Kapadia · Bhaskar Krishnamachari ·Shahram Ghandeharizadeh

    © Springer Science + Business Media, LLC 2008

    Abstract This study investigates replication strategiesfor reducing latency to desired content in a ve-hicular peer-to-peer network. We provide a generalconstrained optimization formulation for efficient repli-cation and study it via analysis and simulations em-ploying a discrete random walk mobility model for thevehicles. Our solution space comprises of a family ofpopularity based replication schemes each character-ized by an exponent n. We find that the optimal repli-cation exponent depends significantly on factors suchas the total system storage, data item size, and vehicletrip duration. With small data items and long clienttrip durations, n ∼ 0.5 i.e., a square-root replicationscheme provides the lowest aggregate latency across alldata item requests. However, for short trip durations,n moves toward 1, making a linear replication schememore favorable. For larger data items and long clienttrip durations, we find that the optimal replicationexponent is below 0.5. Finally, for these larger dataitems, if the client trip duration is short, the optimalreplication exponent is found to be a function of thetotal storage in the system. Subsequently, the aboveobservations are validated with two real data sets: one

    S. Kapadia (B) · B. Krishnamachari · S. GhandeharizadehDepartment of Computer Science, University of SouthernCalifornia, Los Angeles, CA 90089, USAe-mail: [email protected]

    S. Ghandeharizadehe-mail: [email protected]

    B. KrishnamachariDepartment of Electrical Engineering,University of Southern California,Los Angeles, CA 90089, USAe-mail: [email protected]

    based on a city map with freeway traffic informationand the other employing encounter traces from a busnetwork.

    Keywords data replication · availability ·vehicular ad-hoc networks · optimization

    1 Introduction

    Advances in computer processing, data storage, andwireless communications have made it feasible to en-vision on-demand delivery of content such as audioand video clips between mobile vehicles. The contentexchanged between the vehicles may vary from trafficinformation such as accident notifications and emer-gency vehicle arrival notifications to multimedia forentertainment such as audio files, cartoons, moviesand other video files. A vehicle is equipped with anAutoMata (formerly known as a C2P2 for Car-to-CarPeer-to-Peer [10]) device consisting of several gigabytesof storage, a fast processor and a short-range wirelessinterface with bandwidths of several tens of Mbps [2].The AutoMata-equipped vehicles, forming an intermit-tently connected network [23, 26], collaborate to realizean application for on-demand delivery of entertain-ment content.

    In such a system, when a client vehicle issues a re-quest for desired content not found in its local storage,this request can be satisfied only when it is in thevicinity of another vehicle that carries the requesteditem. Therefore, a key metric is availability latency,defined as the time between request issuance and re-quest satisfaction. Clearly, the more the number ofvehicles carrying a certain data item, the lower will be

  • Mobile Netw Appl

    its expected availability latency. However, constraintson storage limit the number of unique data items thateach vehicle can carry.

    All data items in the system repository are not likelyto be equally requested.1 The aggregate availabilitylatency across all items is therefore the average latencyfor individual data items weighed by their respectivepopularities. Hence, in order to minimize this aggregatelatency metric, the popular data items warrant morereplicas. We address the following key question inthis study: how many replicas should be allocated foreach item in the repository in order to minimize theaggregate availability latency?

    Several parameters impact the optimal replicationscheme. For instance, the mobility model of the vehiclesimpacts how the expected availability latency for a dataitem decreases with the number of replicas for thatitem. The size of a data item and the available band-width between devices dictates whether it is possibleto download the entire data item in a single encounterbetween a client and a vehicle carrying the requesteditem. Moreover, the client trip duration bounds themaximum amount of time that a client is willing to waitfor a request to be satisfied. The distribution of the dataitem popularities is a key component of the aggregatelatency metric. The available storage directly affects theconstraints under which the optimal replication strategycan be found.

    Our primary contributions are as follows. We firstprovide a general optimization formulation for mini-mizing the average availability latency subject to a stor-age constraint per vehicle. To solve this optimization,the solution space we explore comprises a family ofpopularity-based replication schemes each character-ized by an exponent n. This exponent, n, defines therelation between the replicas of a data item and itspopularity. We are interested in the optimal replicationexponent that minimizes the aggregate availability la-tency. With small data items and long client trip dura-tions, we find that n ∼ 0.5 i.e., a square-root replicationscheme provides the lowest aggregate latency. How-ever, for short trip durations, we find that the optimalreplication exponent (n) moves toward 1, making alinear replication scheme more favorable. For largerdata items and long client trip durations, we find thatthe optimal replication exponent is below 0.5. In thelimit for extremely large data items, a random (n = 0)replication scheme that allocates the same number ofreplicas to all data items yields the minimum aggregate

    1It is found that Zipf’s law controls many of the features observedwith the Internet, primarily because of different user preferencesfor different files [3].

    latency. Finally, for such data items, if the client tripduration is short, we find that the optimal replicationexponent is a function of the total storage in the system.Specifically, in low storage scenarios, a linear schemeshows superior performance, while for moderate tohigh storage scenarios a square-root replication schemeis preferred. Moreover, if the storage is abundant evena random replication scheme is good enough to providea low aggregate latency.

    While the above results are based on a 2D ran-dom walk based mobility model for the vehicles, inthe second part of this study, we validate our modelobservations with vehicular movements obtained fromreal data sets. Specifically, two independent validationphases are presented employing (a) A real map of anurban environment that dictates the mobility transi-tions of the Markov model and (b) Fine grained mo-bility traces from a real environment comprising busesmoving around a university campus area. The observa-tions from these studies indicate that a random walk-based mobility model captures performance trends thatmay be applicable for a wide range of scenarios.

    The rest of this paper is organized as follows.Section 2 gives a brief overview of the related workin the area. Section 3 presents the general optimizationformulation and details about the family of frequency-based replication schemes explored in our study.Section 4 employs mathematical analysis and simula-tions to determine the optimal replication scheme forsmall data items and long client trip durations. Theresults for small data items are extended to considershort client trip durations in Section 5. Sections 6 and 7present the optimal replication scheme for larger dataitems with long and short client trip durations, respec-tively. Subsequently, representative results obtainedwith the random walk model are validated on a mapof the city of San Francisco in Section 8 as well as areal data set comprising of movement traces of buses ina small neighborhood in Amherst (Section 9). Finally,Section 10 presents conclusions and future researchdirections.2

    2 Related work

    Techniques to determine number of replicas are sim-ilar to assigning seats to different parties as a func-tion of their popularity, i.e., ratio of votes casted fora party to the total vote. Webster’s divisor methodand its alternatives attributed to Hamilton, Adams and

    2A detailed technical report of this study is available as [22].

  • Mobile Netw Appl

    Jefferson allocate storage to each replica (assign seatsto a party) as a linear function of its frequency of access(popularity). The divisor technique has been employedto determine the number of replicas of a video clip ina distributed video-on-demand architecture [27]. Ourproposed framework captures this technique by settinga key parameter, denoted as n (see Section 3), to one.

    Techniques to compute number of replicas for ob-jects have been studied for both peer-to-peer networks[5] and mesh community networks [7, 11]. Mobility ofnodes is our primary contribution and separates theseprior studies from the work presented here. In the fol-lowing, we provide an overview of these prior studies.Replication of objects is important to their discovery inan un-structured peer-to-peer network. A smart repli-cation technique minimizes search size, defined as thenumber of walks required to locate a referenced dataitem. In [5], the divisor method is compared with onethat employs the square root of the frequency of access,demonstrating the superiority of the later.

    Replication in MANETs has been explored in a widevariety of contexts. Hara [13] proposes three replica al-location methods. The first one that allocates replicas tonodes only on the basis of their local preference to dataitems. The second technique extends the first by con-sidering the contents of the connected neighbors whileperforming the allocation to remove some redundancy.The last technique discovers bi-connected componentsin the network topology and allocates replicas accord-ingly. The frequency of access to data items is knownin advance and does not change. Moreover, the replicaallocation is performed in a specific period termed therelocation period. Several extensions to this work havebeen proposed where replica allocation methods havebeen extended to consider data items with periodic[14, 15] and aperiodic [18] updates. Further extensionsto the proposed replica allocation methods consider thestability of radio links [19], topology changes [20] andlocation history of the data item access log [16, 17].

    Our study differs from prior studies in the followingways. We formulate a general optimization problemthat minimizes an aggregate latency metric subject toa storage constraint per vehicle. We propose a fam-ily of replication schemes and explore which schemeprovides optimal latency under a variety of scenariosby solving the optimization formulation. The mobilityof vehicles is represented by a Markov-based mobilitymodel that is general enough to capture a wide va-riety of mobility models such as Freeway, Highway,Random Way-point etc. We have analyzed the latencyperformance obtained with such a mobility model viamathematical analysis and extensive simulations. Thedifferent scenarios encompass vehicles with unbounded

    as well as finite trip durations, data items with differentdisplay times etc.

    3 General framework

    In this section, we first introduce some definitions andassociated terminology used repeatedly in this paper.Then, we provide a general optimization formulationfor minimizing availability latency in the presence ofstorage constraints. Table 1 summarizes the notationused in this study.

    Assume a network of N mobile AutoMata devices,each with storage capacity of α bytes. The total storagecapacity of the system is ST=N · α. There are T dataitems in the repository, each with a display time of�i seconds and display bandwidth requirement of βi.Hence, the size of each item is given by Si = �i · βi.The frequency of access to item i is denoted as fi with∑T

    j=1 f j = 1. Let the trip duration of the client Au-toMata under consideration be γ . We now define thenormalized frequency of access to the item i, denotedRi, is:

    Ri = ( fi)n

    ∑Tj=1( f j)n

    ; 0 ≤ n ≤ ∞ (1)

    Ri is normalized to a value between 0 and 1. Thenumber of replicas for data item i, denoted as ri, is:

    ri = min(

    N, max(

    1,

    ⌊Ri · N · α

    Si

    ⌋))

    (2)

    This defines a family of replication schemes thatcomputes the degree of replication of item i as thenth power of its frequency of access. The exponent ncharacterizes a particular replication scheme. Hence, rilies between 1 and N. Note that ri includes the originalcopy of item i. One may simplify Eq. 2 by replacing themax function with � Ri·N·αSi �, which would allow value ofri to drop to zero. This means that there is no copy ofitem i in the AutoMata network. A hybrid frameworkmight provide access to the item i. For example, a basestation employing IEEE 802.16 [12] might facilitateaccess to a wired infrastructure with remote serverscontaining the item i. However, in this study, we assumeat least one copy of every data item must be present inthe ad-hoc network at all times.

    The availability latency for a given request for itemi, denoted as δi, is defined as the time after which aclient AutoMata will find at least one replica of itsrequested item accessible to it, either directly or viamultiple hops. Additionally, δi must be greater than �i,the data item display time. If a replica for item i is notencountered for a given request, we set δi to γ . This

  • Mobile Netw Appl

    Table 1 Terms and theirdefinitions

    Database ParametersT Number of data items.Si Size of data item i.�i Display time of data item i.βi Bandwidth requirement of data item i.fi Frequency of access to data item i.Replication ParametersRi Normalized frequency of access to data item i,

    Ri = ( fi)n∑Tj=1( f j)n

    ; 0 ≤ n ≤ ∞ri Number of replicas for data item i,

    ri = min (N, max (1, � Ri·N·αSi �))n Characterizes a particular replication scheme.δi Average availability latency of data item iδagg Aggregate availability latency for replication technique

    using the nth power, 0 ≤ n ≤ ∞, δagg = ∑Tj=1 δ j · f jAutoMata System ParametersN Number of AutoMata devices in the system.α Storage capacity per AutoMata.γ Trip duration of the client AutoMata.ST Total storage capacity of the AutoMata system, ST = N · α.

    indicates that item i was not available to the clientduring its journey. Also, if �i exceeds γ for a certainitem i then we set δi to γ . We are interested in theavailability latency observed across all the data items.Hence, we augment the δi for every item i with its fi.This is termed the aggregate availability latency (δagg)metric. It is computed as follows. For each item i,calculate the average availability latency (δi) based onthe particular replication scheme of interest. Then theseavailability latencies are combined into a single metric:δagg = ∑Ti=1 δi · fi.

    The aggregate availability latency depends on thevalue chosen for n, since n determines the replicas perdata item. Intuitively, a higher number of replicas foritem i will reduce the availability latency for a requestfor that item. The core problem of interest here is tokeep the aggregate availability latency as low as pos-sible by tuning the data item replication levels, inthe presence of storage constraints. We assume thatthe data item repository size is smaller than the totalstorage capacity of the system,

    ∑Ti=1 Si ≤ ST . Other-

    wise, data items cannot be replicated when at leastone replica of a data item must be present in the sys-tem. More formally, the optimization problem can bestated as,

    Minimize δagg, subject to∑T

    i=1 Si ≤ ST (3)

    Implicit in this formulation is the design variable,namely, the desired replication for each data item.The replication exponent n determines a ri value foreach data item i with the objective to minimize δagg.

    This minimization is a challenge when the total sizeof the database exceeds the storage capacity of a car,∑T

    i=1 Si > α. Otherwise, the problem is trivial and canbe solved by replicating the entire repository on eachdevice.

    The optimization space that defines what value of nprovides the best δagg is quite large and consists of thefollowing parameters: (i) density of cars, (ii) data itemdisplay time, (iii) data item size, (iv) display bandwidthper data item, (v) data item repository size, (vi) storageper car, (vii) client trip duration, (viii) frequency ofaccess to the data items, and (ix) mobility model forthe cars. In this study, we determine how the aggregateavailability latency and hence the optimal replicationscheme is affected by each of these parameters.

    3.1 Model preliminaries

    In this section, we provide details about the discreteMarkov mobility model adopted in this study. We as-sume a repository of homogeneous data items withidentical bandwidth requirement, display time, and size(βi = β, �i = �, Si = S). Figure 1 shows an examplemap used in our study that comprises with fixed sizecells.3 The vehicles4 themselves are assumed to be dis-tributed across these cells. Only AutoMatas within a

    3In the analysis, the map is considered to be a torus to avoidborder effects [21, 25].4We use the term AutoMata and a vehicle (or car) interchange-ably in this study, with the assumption that each vehicle isequipped with a single AutoMata device.

  • Mobile Netw Appl

    1

    8

    15

    22

    29

    36

    6

    11

    16

    21

    26

    31

    2 3 4 5

    7 9 10 12

    13 14 17 18

    19 20 23 24

    25 27 28 30

    32 33 34 35

    Start cell

    Figure 1 An example 6 × 6 map

    cell communicate with each other either directly if theyare in radio-range or via other AutoMatas using multi-hop transmissions. In other words, the AutoMataswithin a cell form a connected sub-network. AutoMatasin adjacent cells cannot communicate with each other.

    A Markov mobility model describes the movementof the vehicles where the vehicles perform a 2D randomwalk. Hence, in each time step, a vehicle in a given cellmay transition to any one of its neighboring 4 cells.Each cell of the map constitutes a state. A map ofsize G × G yields G2 states. These states are self-contained and a transition from one state to anotheris independent of the previous history of a car in thatstate. The aggregate of the transitions from each cell(state) to every other state gives the G × G probabilitytransition matrix Q = [qij] where qij is the probabilityof transition from state i to state j.

    Using Markov chains, it is possible to estimate thedistribution of the steady-state probabilities of beingin the various cells, by solving � = � ∗ Q, where �is the vector representing the steady-state probabilitiesof being in the various cells (states). While for the 2Drandom walk mobility model, due to symmetry and ig-noring border effects, � = 1G , in general, the map mayrepresent an underlying city area where the transitionprobabilities for each cell may not be symmetric ineach of the 4 possible directions. For example, in theexample map depicted in Fig. 1, the mobility modelis weighted toward the diagonal both from the left toright and vice-versa. This may be due to the presence oftwo major freeways running across a city area that themap represents. Note that in order to incorporate di-rectionality, the transition for a vehicle from its currentcell may incorporate the current as well as the previous

    location of the vehicle (see Section 8). In this case,by incorporating previous as well as current vehiclelocation information in the state, the mobility modelcan still be solved employing Markov chains.

    Without any loss of generality, to reduce the di-mensionality of the problem, we express the data itemdisplay time, �, as the amount of time required byan AutoMata equipped vehicle to travel � cells. Weexpress α as the number of storage slots per AutoMata.Each storage slot stores a data item fragment equiv-alent to a single cell worth of data item display time.Moreover, we assume the amount of data displayed ineach cell is identical. Now, we represent both the sizeof a data item and the storage slots in terms of thenumber of cells. This means that a data item has adisplay time of � cells and an AutoMata has α unitsof cell storage. For example, a data item with displaytime of 4 cells (� = 4) requires 4 storage slots and anAutoMata provides 100 storage slots (α = 100). Hence,from hereon, we assume that the size of the data item isindicated by its display time.

    The trip duration (γ ) is the maximum amount oftime that a client vehicle is willing to wait for requestsatisfaction. Here, it is expressed as the maximum num-ber of cells that the client is willing to traverse before itgives up on a given request for a data item. We alsodefine availability latency (δi) for data item i in discreteterms, as the number of cells after which a client Au-toMata will encounter a replica of the requested dataitem i, either directly or via multiple hops, for the dataitem display time (�). Hence, the possible values of theavailability latency are between 0 and γ . While in mostcases, the trip duration is usually short, occasionally aclient may specify an extremely large value of γ indicat-ing that it is willing to wait as long as it takes to satisfyan issued request. As we shall show in the followingsections, the long versus short client trip durations havea profound impact on the optimal replication schemethat minimizes the aggregate availability latency.

    4 Data items with small size and long clienttrip duration

    In this section, we consider small data items with a dis-play time of one, where the client trip duration is long.We first present analytical approximations that capturethe performance of availability latency for an item as afunction of the number of replicas for that item for botha low and high density of replicas. Subsequently, weemploy simulations to determine the optimal replica-tion exponent that minimizes the aggregate availabilitylatency.

  • Mobile Netw Appl

    4.1 Analysis of data items with display time = one

    In this section, for a scenario with a sparse density ofdata item replicas, we derive a closed-form expressionfor the aggregate availability latency. Subsequently, weuse this expression to solve the optimization problemto reveal that a square-root replication scheme mini-mizes this latency. Then, we derive an expression thatapproximates the aggregate availability latency in caseof a high density of data item replicas.

    4.1.1 Sparse scenario

    In this section, we provide a formulation that capturesscenarios with a low density of vehicles. For a givenmobility model of the vehicles, the relationship be-tween δi and ri can be obtained using simulations. Forillustration, we have considered that vehicles followa random walk-based mobility model on a 2D-torus.Aldous and Fill [1] show that the mean of the hittingtime for a symmetric random walk on the surface of a2D-torus is �(GlogG) where G is the number of cellsin the torus. Moreover, the mean of the meeting timefor 2 random walks is half of the mean hitting time.Furthermore, the distribution of the meeting times foran ergodic Markov chain can be approximated by anexponential distribution of the same mean [1]. Hence,

    P(δi > t) = exp( −t

    c · G · logG)

    (4)

    where the constant c � 0.34 for G ≥ 25. Now sincethere are ri replicas, there are ri potential servers.Hence, the meeting time, or equivalently the avail-ability latency for the data item i is the time till itencounters any of these ri replicas for the first time.This can be modeled as a minimum of ri exponentials.Hence,

    P(δi > t) = exp(

    −tc · Gri · logG

    )

    (5)

    Note, however that this formulation is valid only forthe cases when G >> ri, which is the case for sparsescenarios. The expected value of δi is given by:

    δi = c · G · logGri (6)

    For a given 2D-torus, G is constant, hence we have δi ∝1ri

    or equivalently, δi = Cri where C = c · G · logG.

    Hence, we have the following optimization formula-tion,

    Min

    [T∑

    i=1fi · Cri

    ]

    Subject to (7)

    T∑

    i=1ri = N · α; 1 ≤ ri ≤ N ∀ i = 1 to T (8)

    Theorem 1 The solution to the optimization formula-tion presented in Eqs. 7–8 is:

    ri =

    ⎧⎪⎪⎪⎪⎪⎨

    ⎪⎪⎪⎪⎪⎩

    √fi·N·α

    ∑Tj=1

    √f j

    1N·α ≤

    √fi

    ∑Tj=1

    √f j

    ≤ 1α

    max(

    1, min(√

    fi·Cγ0

    , N))

    in the general case

    where γ0 is s.t.∑Ti=1 ri = N · α

    (9)

    In other words, in case of a sparse density of vehicles,a replication scheme that allocates data item replicas as afunction of the square-root of the frequency of access todata items minimizes the aggregate availability latency.This is valid for data items with display time = 1.

    Proof We solve the above optimization using themethod of Lagrange multipliers. First, we prove part(i)of the theorem.

    The Lagrangian for the optimization can be writtenas:

    H =T∑

    i=1

    fi · Cri

    + ϕ[

    T∑

    i=1ri − N · α

    ]

    (10)

    We solve for ri to get:

    ri =√

    fi · N · α∑T

    j=1√

    f j(11)

    The constraints are satisfied if 1N·α ≤√

    fi∑T

    j=1√

    f j≤ 1

    α

    which proves part (i) of the theorem.Without this condition on fi, the above optimization

    can be re-written as the following Lagrangian taking allthe constraints into account as:

    G =T∑

    i=1

    fi · Cri

    + γ0[

    T∑

    i=1ri − N · α

    ]

    −T∑

    i=1γi · (ri − N · α) −

    T∑

    i=1βi · (−ri + 1)

  • Mobile Netw Appl

    The Kuhn Tucker Conditions for the modifiedLagrangian are:

    − fi · Cr2i

    + γ0 − γi + βi = 0; ∀ i = 1 to T (12)

    T∑

    i=1ri ≤ N · α, γ0 ≥ 0, and γ0

    [T∑

    i=1ri − N · α

    ]

    = 0 (13)

    ri ≤ N, γi ≥ 0, and γi · (ri − N) = 0; ∀ i = 1 to T (14)

    −ri ≤ −1, βi ≥ 0, and βi · (−ri + 1) = 0; ∀ i = 1 to T(15)

    Solving Eq. 12, we get,

    ri =√

    fi · Cγ0 − γi + βi (16)

    Equations 14 and 15 imply that either γi = 0 or ri =N and also either βi = 0 or ri = 1 respectively. There-fore, the optimum solution for ri is given by,

    ri = max(

    1, min

    (√fi · Cγ0

    , N

    ))

    (17)

    where γ0 is such that∑T

    i=1 ri = N · α proving part (ii) ofthe theorem. �

    Hence, in a sparse network, the optimal replica-tion that minimizes the aggregate availability latencyis obtained if the number of replicas for a data itemis proportional to the square root of the frequencyof access for that data item. Cohen and Shenker [5]proved that for unstructured peer-to-peer networks theexpected search size is minimized using a square-rootreplication strategy which is shown to be optimal. Theaggregate availability latency metric in wireless mobilead-hoc networks is analogous to the expected searchsize used in peer-to-peer networks.

    It should be noted that, in general, the optimal repli-cation depends on how δi is related to ri i.e. δi = F(ri)and F(·) is the function that will determine the optimalreplication strategy. The above methodology can beused to be obtain the optimal number of replicas as longas F(·) is differentiable.

    Figure 2 shows the typical trend shown by δi for a10 × 10 torus, where ri is increased from 1 to N whereN = 100. In other words, in a G = 100 cell torus, N =100 cars are deployed, with ri of them having a replica

    0 20 40 60 80 10010

    -3

    10-2

    10-1

    100

    101

    102

    Number of replicas ( ri )

    Ave

    rage

    ava

    ilabi

    lity

    late

    ncy

    Sparse-approximationSimulation

    Figure 2 Sparse analysis (Eq. 6) versus simulation obtainedaverage availability latency for a data item as a function of itsreplicas for a 10 × 10 torus, when the number of cars is set to 100

    for the data item. We only consider a single data item,a request for that item can be issued at any vehiclechosen uniformly at random among all the cars. If theitem is stored locally, the latency is 0. This result isindependent of the storage per car because a maximumof one copy of a given data item may be stored in a car.Figure 2 indicates that when ri is small, (ri ≤ 20) the an-alytical approximation in Eq. 6 is valid. Subsequently,latency reduces at a much faster rate when comparedto that predicted by the sparse approximation. This isbecause for a given G, as ri increases, the latency till anyone of the ri replicas is encountered can no longer bemodeled as the minimum of ri independent exponen-tials. In the next section, we provide an approximationthat captures the high density case.

    4.1.2 Dense scenario

    In this section, we provide an analytical formulationthat captures the trends shown by the availability la-tency in the presence of a high density of replicas.Recall that N cars are distributed uniformly at randomacross G cells, ri of the N cars carry a copy of the dataitem of interest. Here, we use the traditional definitionof the expected availability latency for data item i, δi =∑∞

    k=0 k · P(δi = k).We first determine an expression for the case when

    the latency is 0. This occurs if the data item is locallystored at a client or a data item replica is located in thesame cell as the client at which the request is issued.

  • Mobile Netw Appl

    Hence, the probability that the latency experienced bya client is zero is given by the following expression:

    P(δi = 0) = riN +(

    1 − riN

    )·(

    1 −(

    1 − 1G

    )ri)

    (18)

    Figure 3a indicates that the analytical expressionabove matches the simulation results quite well. For agiven car density N, as the density of replicas increases,the probability that the availability latency experiencedby a client is zero also increases. Figure 3b showshow this probability varies with increasing car density.Given a torus comprising G cells, increase in P(δi = 0)shows a decreasing steepness as N increases. This is

    0 10 20 30 40 500

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Number of Replicas (ri)

    Analysis Simulation

    (a)

    0 50 100 150 200 2500

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Number of Replicas (ri)

    P(

    δ i =

    0)

    P(

    δ i =

    0)

    N=50N=100N=150N=200N=250

    (b)

    Figure 3 a shows the validation of the analytical expression inEq. 18 for the probability that availability latency is zero whenN = 50. b shows the probability that the availability latency iszero as a function of the replicas for the data item for 5 differentcar densities {50, 100, 150, 200, 250}

    because with increasing N, the number of potentialclients from which a request can be issued also goes up.Hence, a given value of ri implies a greater percentageof the vehicles store the requested item i locally for asmaller N as compared to a larger one. Consequently,the corresponding P(δi = 0) is lower for a smaller N ascompared to a larger one.

    We provide an approximation assuming a memory-less mobility model without regards to the shape of theregion across which the vehicles are moving. Define,Ak, the event that a data item i is encountered by theclient for the first time in the kth cell. This implies thatthe item i was not encountered in any of the previousk − 1 cells. Let P(Ak) denote the probability of eventAk occurring. Note P(Ak) is a joint probability func-tion. Let pk denote the probability of encountering dataitem i in the kth cell, given that it was not encounteredin the previous k − 1 cells. Note that pk is a conditionalprobability. Also, p1 = P(δi = 0) as defined by Eq. 18.Then,

    pk = 1 −(

    1 − 1G − k + 1

    )ri; 2 ≤ k ≤ G (19)

    Note that the model assumes that not encounteringthe data item in the (k − 1)th cell increases the proba-bility of encountering it in the kth cell. Moreover, whenk = G, pk = 1 no matter what the value of ri, meaningthat the maximum latency that a client will encounterwill always be

  • Mobile Netw Appl

    0 20 40 60 80 10010

    -3

    10-2

    10-1

    100

    101

    102

    Number of replicas ( ri )

    Ave

    rage

    ava

    ilabi

    lity

    late

    ncy

    Sparse-approximationSimulationDense Approximation

    Figure 4 The complete picture depicting the availability latencyfor a data item obtained via simulations as compared with itssparse and dense approximation as a function of its replicas fora 10 × 10 torus, when the number of cars is set to 100

    replication exponent that minimizes the aggregateavailability latency.

    4.2 Simulation results of data itemswith display time = one

    We present simulation results indicating the replicationexponent range that provides near optimal aggregateavailability latency. We also show how this latency isaffected by the different parameters in the optimizationspace. In all our experiments, we assume that the var-ious data item popularities are distributed as per theZipf’s law [28]. This means that the frequency of therth popular data item is inversely proportional to itsrank i.e.

    fi =1iv

    ∑Tj=1

    1jv

    ; 1

  • Mobile Netw Appl

    is increased. The replication schemes with exponentvalues 0, 0.5, and 1 have been popularly studied inthe literature and are labelled random, square-root,and linear respectively. The value of w captures theskewness in the data item popularities, a higher value ofw indicates that most of the popularity weight is spreadacross the first few popular titles. Below we describe themain observations from this figure.

    The random scheme allocates the same number ofreplicas per data item irrespective of their popularity.Hence, in all cases, it yields the same aggregate avail-ability latency irrespective of the value of w. As thereplication exponent increases from 0 to 1 progressivelymore replicas are allocated for the popular data items.This increase in the replicas is accelerated for highervalues of w that provide a bias for the popular titles.Hence, we see a sharp decrease in the availability la-tency from n = 0 to n = 0.3 for w = −1.5 and w = −2.However, the maximum number of replicas per dataitem can never exceed N. For a value of w = −0.5 inwhich case the popularity weight is spread more evenlyamong all the data items, it almost doesn’t matter whatthe replication scheme is as seen by the relatively flatlatency curves.

    When storage per car is low, α = 4, this representsa scenario with a sparse density of data item replicas.In this case, the square-root replication scheme pro-vides the minimum latency. Also, the range where thereplication exponent n varies from 0.4 to 0.6 shows alatency very close to the square-root scheme. This istrue even when the data item popularities are skewed.Moreover, the range 0.4 ≤ n ≤ 0.6 shows near optimallatency performance even when the storage is increased(see Fig. 5b and c). In other words, through the entirespectrum of the replica density, a replication schemedefined by an exponent in this range will provide nearoptimal performance. For the rest of this study, we willconsider the square-root (n = 0.5) scheme as represen-tative of this range and compare its performance to thetwo extremes namely, random (n=0) and linear (n=1).Next, we present results from a set of experimentsthat are representative of the general trend observedwith respect to the relative performance of the threereplication schemes.

    4.3 Variation in car density

    Figures 6 and 7 presents the performance of the threereplication schemes, for popularity skewness values of−0.5 and −2.0 respectively, as a function of the cardensity when the storage per car is held constant at 3 forT = 100. Increase in the car density increases the totalstorage in the system. Hence, more replicas per data

    0 50 100 150 200 2500

    10

    20

    30

    40

    50

    60

    Number of cars (N)

    Agg

    rega

    te A

    vaila

    bilit

    y La

    tenc

    y (δ

    agg)

    Sqrt LinearRandom

    0 50 100 150 200 25010

    0

    101

    102

    103

    Number of cars (N)

    % w

    rt s

    quar

    e-ro

    ot s

    chem

    e

    LinearRandom

    Figure 6 Aggregate availability latency for the three replicationschemes as a function of the car density for w = −0.5 when thestorage per car is fixed at 3, T = 100

    item can be allocated resulting in an overall decreasein the aggregate availability latency. This is true for allreplication schemes. However, here for w = −0.5 andw = −1 (beyond N = 100), the random scheme showsslightly better performance than the linear scheme.This is because for low skewness parameters assigningequal number of replicas per data item is better thanproviding higher replicas for the popular data itemswhich do not have a sufficiently high popularity weight.However, for higher skewness in popularity (w = −1.5and w = −2.0), the behavior of the linear scheme startspaying richer dividends in reducing the overall latency,hence, it outperforms the linear scheme. In all cases, thesquare-root scheme always yields the lowest aggregateavailability latency. Similar performance trends are ob-served when the storage in the system is varied by in-creasing the storage per car or increasing the data item

  • Mobile Netw Appl

    0 50 100 150 200 2500

    10

    20

    30

    40

    50

    60

    Number of cars (N)

    Agg

    rega

    te A

    vaila

    bilit

    y La

    tenc

    y (δ

    agg)

    Sqrt LinearRandom

    0 50 100 150 200 25010

    0

    101

    102

    103

    Number of cars (N)

    % w

    rt s

    quar

    e-ro

    ot s

    chem

    e

    LinearRandom

    Figure 7 Aggregate availability latency for the three replicationschemes as a function of the car density for w = −2.0 when thestorage per car is fixed at 3, T = 100

    repository size keeping the other parameters constant.To avoid redundancy, we do not present those resultshere.

    5 Data items with small size and short clienttrip duration

    The analysis and simulation results presented so farassumed that once a request is issued at a client, it iswilling to wait as long as it takes for its request to besatisfied. In other words, the client trip duration wasassumed to be unbounded. For the specific mobilitymodel under consideration, namely 2D random walk ona torus, the maximum latency experienced by a clientis bounded [1] as long as at least one replica of everyitem is present in the system at all times. However,in more practical scenarios, the client may have a cer-

    tain maximum time it is willing to wait for requestresolution. This is captured by considering a finite tripduration, γ , for the client. The availability latency foritem i, δi, can be any value between 0 and γ − 1. If theclient’s request is not satisfied, we set δi = γ indicatingthat the client’s request for item i was not satisfied.

    5.1 Analysis

    As before with Section 4.1, here, we derive expressionsfor average availability latency of a data item as afunction of its replicas for a short client trip duration.Below, we present approximations in the case of lowand high density of replicas.

    5.1.1 Sparse approximation

    Recall that latency in the case of a 2D-random walk ona torus can be modeled as an exponential distributionas:

    P(δi > t) = λ exp (−λt) (21)where λ = ric·G·log G . The average availability latencywith finite trip duration γ is then given by,

    δi =∫ γ

    0xλ exp (−λt)dx +

    ∫ ∞

    γ

    γ λ exp (−λt)dx (22)

    Hence, we get

    δi = c · G · log Gri ·[

    1 − exp( −γ · ri

    c · G · log G)]

    (23)

    5.1.2 Dense approximation

    Recall that as defined in Section 4.1.2, Ak is the eventthat a data item i is encountered by the client for thefirst time in the kth cell and P(Ak) is the probabilitythat event Ak occurs. Also, pk is the probability ofencountering data item i in the kth cell, given that it wasnot encountered in the previous k − 1 cells. Then,

    pk = 1 −(

    1 − 1G − k + 1

    )ri; 2 ≤ k ≤ γ (24)

    Also, we rewrite P(Ak) incorporating the finite tripduration constraint as, P(Ak) = pk ∏k−1j=1 (1 − pj) ; 2 ≤k ≤ γ . Let P(Aγ+1) denote the probability of not en-countering the data item i during the entire trip du-ration γ . Hence, P(Aγ+1) = ∏γj=1 (1 − pj) Then, theaverage availability latency for data item i is given by,δi = ∑γ+1k=1 (k − 1)P(Ak).

    Figure 8 shows that the above approximations forlow and high density of replicas matches the latency

  • Mobile Netw Appl

    0 10 20 30 40 5010

    -2

    10-1

    100

    101

    Number of replicas ( ri )

    Ave

    rage

    ava

    ilabi

    lity

    late

    ncy

    ( δi )

    Sparse-approximationSimulation Dense-approximation

    Figure 8 Average availability latency for a data item as a func-tion of its replicas for a finite trip duration γ of 10. The simulationcurves are plotted along with the sparse and dense approxima-tions for finite trip duration for a 10 × 10 torus, when the numberof cars is set to 50

    obtained by simulations. In this case, the dense approxi-mation is also valid for a low density of replicas becausethe finite trip duration γ limits the maximum value ofthe availability latency. For a low density of replicas inmost cases the latency will be higher than γ and henceit will be bounded by γ . For a higher replica density, thevalue of γ is not as significant since the latency for thatitem will be much lower than γ .

    5.2 Simulation results

    Figure 9 depicts the latency performance for differentreplication schemes when storage per car is increasedfrom 4 to 25 slots when the trip duration is set as 10.When storage per car is low, α = 4, this represents aconstrained storage scenario. The linear scheme thatallocates more replicas to the popular data items showssuperior performance as compared to the square-rootscheme. This is because in such scenarios the replicasper data item is small, hence, only data items havinga larger number of replicas will provide a latency lessthan γ . Since the popular data items are the onesthat requested more often allocating more replicas forthese items lowers the aggregate availability latency.Contrast this scenario with the case of unbounded trip

    �Figure 9 Aggregate availability latency for different replicationstrategies for a 10 × 10 torus for a finite trip duration of 10 whenT = 100 and N = 50. Figures a, b, and c depict three differentstorage values per car: {4,10,25}

    0 0.2 0.4 0.6 0.8 10

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    Replication Exponent (n)

    Agg

    rega

    te A

    vaila

    bilit

    y La

    tenc

    y (δ

    agg)

    w=-0.5w=-1.0w=-1.5w=-2.0

    0 0.2 0.4 0.6 0.8 10

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    Replication Exponent (n)

    Agg

    rega

    te A

    vaila

    bilit

    y La

    tenc

    y (δ

    agg)

    w=-0.5w=-1.0w=-1.5w=-2.0

    0 0.2 0.4 0.6 0.8 10

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    Replication Exponent (n)

    Agg

    rega

    te A

    vaila

    bilit

    y La

    tenc

    y (δ

    agg)

    w=-0.5w=-1.0w=-1.5w=-2.0

    ( a ) α = 4

    ( b ) α = 1 0

    ( c) α = 2 5

  • Mobile Netw Appl

    duration where a square-root replication scheme al-ways provided the minimum latency (see Fig. 5).

    The optimal scheme here is a super-linear one whichallocates most of the replicas to the first few popularitems after satisfying the constraint that at least onecopy of every item must be present in the network.For a highly skewed scenario, w = −2, allocating all theremaining storage for the most popular item minimizesthe latency. This is because most of the popularityweight is associated with the most popular item whichis requested very often.

    As the storage per car is increased further the curvesstart becoming flatter and at α = 25, see Fig. 9c, areplication scheme characterized by an exponent in therange, 0.3 ≤ n ≤ 1.0, shows near optimal performance.This is because the storage is abundant enough for allthese schemes to allocate a copy of the popular dataitems to every car bringing the latency for these items to0. The difference in the replicas allocated for the lesserpopular data items has minimal effect on the aggregateavailability latency on account of their lower requestrate. Recall, the frequency of access to the data itemsfollows a zipf distribution that depicts a heavy-tailedbehavior.

    6 Data items with large size and long clienttrip duration

    All the results presented so far considered a homoge-neous repository of data items with a display time (�)of one. In this section, we consider data items with ahigher �. In such cases, the latency encountered by aclient’s request is given by the earliest time when a con-tiguous block of � cells containing at least one replicaof the requested item is encountered. Here, we considerscenarios with long client trip durations and present acurve-fit based approximation that captures the rela-tion between the average availability latency and thenumber of data item replicas. Hence, we present theoptimal replication exponent that minimizes aggregateavailability latency in the case of large data items andlong client trip durations.

    Figure 10 depicts the average availability latency fora data item with a higher display time (� = {2, 3, 4, 5})as a function of the replicas for that item. For a givendata item replica density, the latency increases with thedisplay time. As expected the latency reduces with in-crease in the replicas. A simple curve-fit on the latencycurves for all the � values yields a close-match withthe expression of the form δi = Crσi where σ representsthe exponent for a given data item display time and

    0 2 4 6 8 10

    101

    102

    103

    104

    Number of replicas (ri)

    Ave

    rage

    Ava

    ilabi

    lity

    Late

    ncy

    (δi)

    Δ=2Δ=3Δ=4Δ=5

    σ=2.4843

    σ=2.0848

    σ=1.7007

    σ=1.3414

    Figure 10 Average availability latency for a data item as afunction of its replicas for different data item display times fora 10 × 10 torus. The latency is given by Crσi where the exponent σincreases with data item display time

    C is a constant that is a function of the size of thetorus. Note that the value of σ for data items with adisplay time of one is one (see Eq. 6). The values ofσ increase with data item display time. This indicatesthat an increase in the replicas provides a larger dropin the latency for a data item with a higher displaytime. Intuitively, encountering a replica in a contiguousblock of � cells becomes more and more difficult as� increases. Hence, an increase in the replica densityprovides a faster reduction in the latency for the higher� items. This is captured by the increasing value of σwith �.

    The specific formulation δi = Crσi has special signifi-cance since it can be plugged in directly into the op-timization formulation in Section 4.1.1 to determinethe optimum replication scheme that minimizes theavailability latency in case of data items with higherdisplay times.

    Remark In case of a sparse density of vehicles, witha repository of data items with higher display times(� > 1), the replication exponent n that minimizes theaggregate availability latency is such that n < 0.5.

    Following a similar procedure as the proof listedin Theorem 1, we obtain the optimal replication ex-ponents for the σ values capturing higher data itemdisplay times in Fig. 10. Table 2 lists the display timesand the corresponding approximate optimal exponentvalues.

  • Mobile Netw Appl

    Table 2 Approximate optimal replication exponents for dataitems with higher data item display times

    � σ n = 1σ+1

    2 1.3414 0.42713 1.7007 0.37034 2.0848 0.32425 2.4843 0.287

    7 Data items with large size and short clienttrip duration

    In this section, we consider scenarios with short clienttrip durations with data items having higher displaytimes. This implies that the client is only willing to waitfor a short period for its request to be satisfied (denotedby γ ). Otherwise the request is tagged with a latencyequal to γ .

    As with the previous simulations, we assume that thecars employ a 2D random walk based mobility model.Here, we set the client trip duration, γ , as 6, N = 200,and T = 100. We simulated a skewed distribution ofaccess to the T data items using a zipf distribution witha mean of 0.27. The distribution is shown to correspondto sale of movie theater tickets in the United States [6].

    Initially, all cars are distributed across the cells ofthe map as per the steady state distribution which isdetermined by a random number generator initializedwith a seed. Depending on the particular replicationtechnique, the replicas for each data item are calculatedusing Eq. 2 and then distributed across the car. A caronly contains a maximum of one replica for a particulardata item. The allocation of data item replicas acrossthe cars is uniform. At each step, depending on thecurrent car location, it moves to one of its adjoining cell(including itself) as governed by the mobility model.Another seed determines the choice of which cell a carmoves to. Since γ = 6, each car performs six transitionsaccording to the mobility model. We performed thecomparisons for several different data item distributionseeds starting from the same initial car positions. Next,we varied the initial car positions by changing the initialseed. Specifically, we chose 50 different initial seedsand for each of these we used 50 seeds that decidethe distribution of the data item replicas among thecars. Thus, each point in all the presented results is anaverage of 2500 simulations.

    Below is an overview of the key lessons learned fromthese experiments with higher data item display timesand a short trip duration.

    (a) The optimal value of n varies as a function ofthe scarcity of the network storage (b) When storageis scarce, the optimal aggregate availability latency is

    realized by using a higher value of n. (c) Even a randomscheme with n = 0 is good enough when storage isabundant relative to the repository size.

    When storage is extremely scarce, with larger dataitem sizes (� > 1), linear (n = 1) scheme provides thebest performance. This is because it allocates morereplicas for the popular data items at the cost of as-signing very few for the remaining data items. In thiscase, the contribution to δagg is a function of the δ forthe more popular data items since for the less populardata items there will be insufficient replicas to reducetheir δ. On the other hand, since the random scheme isblind to the data item access frequencies, on an average,it assigns equal number of replicas for each data itemthereby providing the worst performance.

    The square root (n = 0.5) scheme assigns fewerreplicas for the popular data items than the linearscheme. As we increase the amount of storage, there isa cut-off point along the storage axis, where allocatingmore replicas for the popular data items provides neg-ligible improvement in δagg. It is beyond this point thatthe square root scheme starts outperforming the linearscheme. This is because the square root scheme can usethe extra storage savings for allocating replicas for theless popular data items thereby reducing their δ.

    To illustrate, Fig. 11 shows the variation of δagg asa function of α for � = 4. Since δagg is a function ofthe value of n, hence, here we denote it as δagg(n =i). For Fig. 11b, the y-axis represents the percent-age comparison of the linear (n = 1) and the random(n = 0) schemes with respect to the square root (n =0.5) scheme calculated as, � =

    (δagg(n=i)−δagg(n=0.5)

    δagg(n=0.5))

    ×100; where i = {0, 1}.

    Figure 11b shows two distinct regions in which theschemes with n = 0.5 and n = 1 perform well undercertain parameter settings within the design space. Forα

  • Mobile Netw Appl

    0 20 40 60 80 1000

    1

    2

    3

    4

    5

    6

    Storage slots per car (α)

    sqrtlinearrandom

    Aggregate Availability Latency (δagg

    )

    (a) δ agg

    0 50 100 150 200 250 300 350-100

    -50

    0

    50

    100

    150

    200

    Storage slots per car (α)

    Latency wrt sqrt scheme (%)

    linear (n=1)

    random (n=0)

    Region I

    Region II

    (b) % Comparison wrt sqrt scheme (Γ)

    Figure 11 a shows δagg of the sqrt, linear and random replicationschemes versus α for � = 4 and N = 200. b shows the percentcomparison of the linear and random schemes wrt the sqrtscheme for this scenario. Region I and Region II, respectively,indicate the parameter space where n = 1 and n = 0.5 performthe best

    illustrated in Fig. 11b with N = 200, T = 100, and � =4, the storage threshold is around 360 slots per car. For� = 5, and 6, this threshold is approximately 450 and540, respectively. These are loose upper bounds.

    8 Evaluation with a map of San Francisco

    In this section, we describe the performance of thevarious replication schemes when the vehicular move-

    ments are dictated by an underlying map of the SanFrancisco Bay Area depicting major freeways and theirintersections. We superimpose a 2D-grid on this mapand the individual cells are labelled with the respectivefreeway id that they cover. This 2D-grid serves to cap-ture the underlying map at a coarse granularity. Mostof the probability mass is concentrated on the cellsthat represent the major freeways. The non-labelledcells have equal transition probabilities to each of itsneighboring eight cells.

    The outgoing transition probabilities at a cell thatrepresents an intersection between two freeways arecalculated as follows. As an example, consider theintersection of the freeways 880 and 85 as shown inFig. 12. We obtained the traffic density seen on the free-ways before and after the intersection from Caltransdata provided by the California department of trans-portation [24]. The website allowed real time gatheringof vehicle traffic data. We considered a time windowbetween 7–8 pm for a particular week and averagedthe vehicular density seen during this period. The day-to-day statistics were quite similar, here, we show anexample of how the actual data was converted into theprobability transition values that formed the basis ofthe Markov mobility model. Similar calculations wereemployed to populate the entire transition probabilitymatrix. Finally, we converted the 15 × 15 grid into atorus by allowing cars at the boundaries to appear atthe opposite ends with equal transition probabilities.

    The transition matrix was used to generate the carmovements. We provide a notion of directionality tothe car movements by ensuring that the next step for acars movement takes into account both the current cellas well as the previous cell which a car traversed. Thisis done by storing both the cell ids as part of the stateof the Markov chain. Consequently, the flip-flop move-ments of the cars is avoided thereby ensuring that carmovements are constrained by the underlying freewaystructure of the map and are not entirely random. We

    Figure 12 The intersection between freeways 880 and 85 iscaptured in the figure along with the equivalent probabilitytransitions in the Markov model based on data obtained fromCaltrans regarding the vehicular densities

  • Mobile Netw Appl

    used these car movements to investigate the relativeperformance of the various replication schemes undersuch a scenario.

    8.1 Results with replication schemes

    In this section, we present some representative resultsfor the various replication schemes obtained by em-ploying the Markov mobility model previously derivedfrom a map of the San Francisco Bay area. Employingaggregate availability latency as the chief performancemetric for comparison between the linear, square-root,and random schemes; we explored the following para-meter space: storage per car, data item repository size,and car density. As an example illustration, we present

    0 50 100 150 200 2500

    1

    2

    3

    4

    5

    6

    7

    8

    9

    Number of cars

    Agg

    rega

    te A

    vaila

    bilit

    y La

    tenc

    y

    linearsqrt random

    (a)

    0 50 100 150 200 2500

    5

    10

    15

    20

    25

    Number of cars

    Late

    ncy

    perf

    orm

    ance

    wrt

    line

    ar

    sqrt random

    (b)

    Figure 13 Performance of various replication schemes as afunction of car density when T = 25, α = 2, and γ = 10. Figureb shows the performance wrt the linear scheme

    performance results with variation in car density. Asbefore, requests are issued, one at a time at each time-step at vehicles in a round-robin manner, as per aZipf distribution with a mean of 0.27. For data itemrepository size T set as 25, client trip duration, γ , set as10, storage per car, α, set as 2, the latency performancewith the various replication schemes is studied as afunction of increasing car density N (see Fig. 13).

    In all cases, the main conclusion is that the linearreplication scheme shows superior performance as seenin Section 5 for the case with data item size = 1 andfinite client trip duration. The trends seen with thismodel are similar to those seen with a uniform Markovmobility model with equal transition probabilities. Thisis because the map for the former causes some cells tobe visited more than others, however, the movementsof the vehicles still remain essentially Markovian. Thisresult suggests that the uniform probability transitionmatrix based Markov model may be a good indicator ofthe performance that may be seen with a model derivedfrom real maps.

    9 Evaluation with real movement traces

    In this section, we evaluate the latency performanceof the static replication schemes using traces obtainedfrom a bus-based DTN test-bed called UMassDiesel-Net [4]. First, we briefly describe the test-bed andpresent some properties of the mobility model followedby the buses. Then, we describe the details of the exper-imental set-up and the results comparing the square-root, linear, and random replication schemes underdifferent parameter settings using these traces.

    9.1 UMassDieselNet traces

    In this section, we briefly describe the details of theUMassDieselNet test-bed and present some propertiesof the mobility model that characterizes the movementof the vehicles that are part of the test-bed. The UMass-DieselNet network operates daily around the universitycampus and the surrounding county. It comprises of 30buses equipped with a Linux based computer coupledwith a IEEE 802.11b wireless interface that permits ad-hoc communication between the buses when they arein radio range. An IEEE 802.11b access point is alsoconnected to the brick computer that allows DHCPaccess to passengers within the bus. The traces areavailable for a period of 60 days, the logs describe everyencounter between every pair of buses that occurredduring the day. The identity of the buses involved in theencounter, the time of encounter and amount of data

  • Mobile Netw Appl

    0 500 1000 1500 20000

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Time between encounters

    CD

    F

    Figure 14 The CDF of the time between encounters averagedacross all the traces for the UMassDieselNet data set

    transmitted during the encounter are logged in the tracefiles. Certain buses had long routes while others hadshort ones. Unfortunately, due to technical difficulties,the GPS device on the buses were unable to providedetails about the bus locations.

    The number of buses that were active on each dayof the 60 day period during which the traces werecollected varied from as low as 6 to as high as 24. Weonly consider traces where the number of active buseswas greater than 15. This accounted for 52 traces. Ingeneral, the traces indicated a sparse density of buseswith a high degree of locality in the encounters. In otherwords, if two buses encounter each other at the begin-ning then they will continue to encounter each othermore frequently than other buses. This is captured inFig. 14 where we plot the CDF of the time betweenconsecutive encounters of the same pair of buses.

    9.2 Experimental set-up

    In this section, we describe the details of the simulationset-up used for evaluation of the replication schemesemploying the UMassDieselNet traces. Each trace rep-resents the movements of buses during that particularday. There is no correlation between trace movementsacross days. Hence, we process each trace at a time andthen average the results observed across all the daysnoting that the average is indicative of the performanceseen on most days. However, certain days do appear asoutliers since the number of active buses differs fromday-to-day.

    As before, we consider a finite data item repositoryof size T. Each bus is assumed to carry α storage slots.Replicas for each data item are determined based on a

    replication scheme and then allocated across the busesuniformly at random. The constraint is that at least onecopy of every data item must be present in the networkat all times. We consider the three representative repli-cation schemes: random, square-root, and linear andstudy the relative performance of the schemes.

    Since the buses only operate for a finite amountof time we consider two separate metrics (i) Averageavailability latency for satisfied requests (ii) Normal-ized unsatisfied request rate. Requests for the T ti-tles are generated as per a Zipf distribution with anexponent w = −0.73. The duration during which thebuses were active during a day is determined aprioriand subject to this duration, requests are issued at equalinter-arrival times. A generated request is assigned to abus chosen uniformly at random. A request is assumedto be satisfied either if the data item requested is locallystored or another bus carrying the requested item isencountered at some point after the request is issued.Those requests that are not satisfied at the end of theday are tagged as unsatisfied requests.

    9.3 Results

    We briefly describe the main results from evaluationof the performance of the replication schemes usingthe UMassDieselNet traces. For the first set of exper-iments, we vary the values of (T,α) as {(5,1), (10,2),(15,3), (20,4), (25,5)}, see Fig. 15. The linear replica-tion scheme provides the lowest average availabilitylatency for satisfied requests (about 10 − 25% betterthan the square-root scheme). The linear and square-root scheme show similar performance in terms of thenormalized unsatisfied requests. The random schemeshows poor performance both in terms of latency aswell as the normalized unsatisfied requests.

    Similar results are obtained when the storage per caris kept fixed and the size of the data item repository isincreased. The mobility model provided by the tracesrepresents an extremely sparse density of buses whereinherently there is a limit to the maximum amountof time for request satisfaction (namely the last en-counter time on the trace). The finite trip duration inconjunction with the low density and encounter modelfavors a linear scheme which allocates more replicasfor the popular data items. The popular data items arerequested more frequently and within the finite time forrequest satisfaction, have a higher probability of beingsatisfied on account of the larger number of replicas.The square-root scheme tries to allocate replicas lessaggressively to the more popular data items in favor ofthe less popular ones. This hurts its performance sincethe less popular data items have a very low probability

  • Mobile Netw Appl

    0 5 10 15 20 250

    1000

    2000

    3000

    4000

    5000

    6000

    7000

    8000

    9000

    10000

    Number of Titles (T)

    Ava

    ilabi

    lity

    Late

    ncy

    for

    Sat

    isfie

    d R

    eque

    sts

    (sec

    s)

    RandomSqrt Linear

    (a)

    0 5 10 15 20 250

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    Number of Titles (T)

    Uns

    atis

    fied

    Req

    uest

    Met

    ric

    RandomSqrt Linear

    (b)

    Figure 15 Aggregate availability latency for satisfied requestsand the aggregate unsatisfied request metric for the random,square-root, and linear replication schemes are shown in a andb respectively. The ratio of the storage per car to the data itemrepository size, αT is maintained as 1:5

    of being satisfied. Notwithstandingly, in such scenarios,a random scheme that allocates replicas equally acrossthe data items shows the worst performance.

    We now consider an equivalent scenario with theMarkov mobility and study its properties in terms of thetime between encounters. The aim is to capture a simi-lar scenario as depicted by the UMassDieselNet traces(compare with Fig. 14). We consider a similar set-upto experimental scenario described in Fig. 15. Similarto the trends seen with the traces, Fig. 16 shows thatthe linear replication scheme outperforms the square-root and the random schemes in terms of the latencyfor satisfied requests. The performance in terms of thenormalized unsatisfied requests is quite similar for thethree schemes. These results suggest that the results

    0 5 10 15 20 250

    50

    100

    150

    200

    250

    300

    Number of Titles (T)

    Ava

    ilabi

    lity

    Late

    ncy

    for

    Sat

    isfie

    d R

    eque

    sts

    linearsqrtrandom

    (a)

    0 5 10 15 20 250

    0.1

    0.2

    0.3

    0.4

    0.5

    Number of Titles (T)

    Uns

    atis

    fied

    Req

    uest

    Met

    ric

    linearsqrtrandom

    (b)

    Figure 16 Aggregate availability latency for satisfied requestsand the aggregate unsatisfied request metric as obtained froman equivalent scenario employing the Markov model. The ratioof the storage per car to the data item repository size, αT ismaintained as 1:5 (a, b)

    obtained from the Markov mobility model may beapplicable across a vast range of scenarios comprisingdifferent mobility models. Adequate adjustment to thetransition probabilities of the Markov model may en-able this model to suitably capture the mobility trendsof other models such as Manhattan, Highway, Randomway-point etc.

    10 Conclusions, discussion, and futureresearch directions

    In this study, we have systematically explored how datareplication impacts the user latency to desired content

  • Mobile Netw Appl

    in a vehicular network. We presented an optimizationformulation that minimizes availability latency across arepository of data items subject to storage constraintsper vehicle. The solution to this optimization yieldsthe optimal replication scheme, characterized by anexponent n that defines the number of replicas for adata item as a function of its popularity. Using math-ematical analysis and simulations, we showed how theoptimal replication exponent varied as a function ofthe three critical factors: data item size, the client tripduration, and the total storage in the system. Whilethe evaluation above was based on the assumption ofa random walk mobility model for the vehicles, wevalidated a subset of our observations employing tworeal data sets. The first was based on a city map withfreeway traffic information while the second employedtraces listing encounters between buses that were partof a small test-bed.

    First, while our study indicates that replication hasa significant impact on latency performance in anAutoMata-based application, we did not directly ad-dress how a replication scheme may be realized. Thispresents a fruitful future research direction. A promis-ing approach is to employ a two-tier architecture [8]comprising of a low-bandwidth control plane betweenthe base-station and vehicles, and a high-bandwidthdata plane representing the ad-hoc, peer-to-peer net-work between vehicles. While all the data exchangetakes place via the data plane, the control plane en-ables centralized information gathering at a suitabledispatcher. The dispatcher is aware of information suchas the total number of cars, the available storage percar, the data item requests etc. On the basis of thetarget replication scheme to be realized, the dispatchercomputes the total number of replicas to be maintainedfor each data item. Hence, every time a request is satis-fied, depending on the current replicas of the requesteditem in the system, the dispatcher decides whether aclient vehicle should create a new copy of that item.If such a process is followed, then after a cold-startphase where replicas are being created and deleted,the data item replica distribution will approximate thetargeted one. Second, this study can be extended interms of heterogeneity with respect to several para-meters. The data item repository may have a mix ofdifferent sizes, the vehicles may have different amountsof available storage and different trip durations de-pending on their data item request preferences, andpopularity distribution of the data items may changeover time, especially when new items are introduced inthe system. Each of these extensions adds another di-mension in terms of practicality towards realizing such asystem.

    Third, we do not explicitly address contention andbandwidth issues within our model. Recently, Jindaland Psounis [21] have provided a contention model thatcan be easily incorporated in our study promising aricher evaluation.

    Fourth, our study is based on static replicationschemes, in that we do not explore any dynamic datare-organization schemes that reactively or pro-activelychange the replica distribution depending on the re-quested items. In a related study, once static replicationschemes have allocated replicas for the various dataitems, vehicles are employed as data carriers [9] todeliver data items from server vehicles carrying itemsto clients requesting them. The study shows that signifi-cant latency improvements can be obtained by employ-ing these intermediate data carriers.

    Finally, vehicular ad-hoc networks are an emergingarea and many of the results presented here may begeneral enough to be applicable in a wide variety ofcontexts such as in intermittently connected mobilenetworks, mobile sensor networks, and other delaytolerant networks.

    Acknowledgements This research was supported by NSFgrants numbered CNS-0347621 (CAREER), CNS-0435505(NeTS NOSS), National Library of Medicine LM07061-01, IIS-0307908, and an unrestricted cash gift from Microsoft Research.

    References

    1. Aldous D, Fill J (2008) Reversible markov chains and ran-dom walks on graphs. http://stat.www.berkeley.edu/users/aldous/RWG/book.html

    2. Bararia S, Ghandeharizadeh S, Kapadia S (2004) Evaluationof 802.11a for streaming data in ad-hoc networks. In: Proc.of the 4th workshop on applications and services in wirelessnetworks (ASWN), Boston, 8–11 August 2004

    3. Breaslau L, Cao P, Fan L, Phillips G, Shenker S (1999)Web Caching and Zipf-like Distributions: Evidence and Im-plications. In: Proc. of IEEE Infocom, pp 126–134, IEEE,Piscataway

    4. Burgess J, Gallagher B, Jensen D, Levine B (2006) MaxProp:routing for vehicle-based disruption-tolerant networking. In:Proc. of IEEE Infocom, Barcelona, April 2006

    5. Cohen E, Shenker S (2002) Replication strategies in unstruc-tured peer-to-peer networks. In: Proc. of ACM SIGCOMM.ACM, New York, pp 177–190

    6. Dan A, Dias D, Mukherjee R, Sitaram D, Tewari R (1995)Buffering and caching in large-scale video servers. In: Proc.of COMPCON, San Francisco, 5–9 March 1995

    7. Ghandeharizadeh S, Helmi T (2003) An evaluation ofalternative continuous media replication techniques in wire-less peer-to-peer networks. In: Proc. of the 3rd internationalACM workshop on data engineering for wireless and mo-bile access (MobiDE, in conjunction with MobiCom’03),San Diego, September 2003

    8. Ghandeharizadeh S, Kapadia S, Krishnamachari B (2004)PAVAN: a policy framework for content availabilty in ve-

    http://stat.www.berkeley.edu/users/aldous/RWG/book.htmlhttp://stat.www.berkeley.edu/users/aldous/RWG/book.html

  • Mobile Netw Appl

    hicular ad-hoc networks. In: VANET ’04: Proc. of the1st ACM international workshop on Vehicular ad hoc net-works, Philadelphia, PA, USA. ACM, New York, NY, USA,pp 57–65. doi:10.1145/1023875.1023885

    9. Ghandeharizadeh S, Kapadia S, Krishnamachari B (2006)An evaluation of availability latency in carrier-based wehic-ular ad-hoc networks. In: Proc. of the 5th ACM interna-tional workshop on Data engineering for wireless and mobileaccess. ACM, New York, pp 75–82

    10. Ghandeharizadeh S, Krishnamachari B (2004) C2P2: a peer-to-peer network for on-demand automobile information ser-vices. In: Proc. of the 1st international workshop on grid andpeer-to-peer computing impacts on large scale heterogeneousdistributed database systems (Globe’04), San Francisco,September 2004

    11. Ghandeharizadeh S, Krishnamachari B, Song S (2004) Place-ment of continuous media in wireless peer-to-peer networks.IEEE Trans Multimedia 6:335–342

    12. IEEE 802.16 Working Group (2004) Emerging IEEE 802.16WirelessMAN standards for broadband wireless access. IntelTechnol J 8(3)

    13. Hara T (2001) Effective replica allocation in ad hoc networksfor improving data accessibility. In: Proc. of IEEE Infocom.IEEE, Piscataway, pp 1568–1576

    14. Hara T (2002) Replica allocation in ad hoc networks withperiodic data update. In: Proc. of the 3rd international confer-ence on mobile data management(MDM). IEEE ComputerSociety, Washington, DC, pp 79–86

    15. Hara T (2003) Replica allocation methods in ad hoc networkswith data update. ACM/Kluwer J Mob Netw Appl(MONET)8(4):343–354

    16. Hara T (2005) Location management of data items inmobile ad hoc networks. In: Proc. of the 2005 ACM sym-posium on applied computing (SAC). ACM, New York,pp 1174–1175

    17. Hara T (2005) Strategies for data location management inmobile ad hoc networks. In: Proc. of IEEE internationalconference on parallel and distributed systems (ICPADS),Minneapolis, 12–15 July 2006

    18. Hara T, Madria S (2004) Dynamic data replication usingaperiodic updates in mobile ad-hoc networks. In: Proc. ofinternational conference on database systems for advancedapplications (DASFAA), Jeju Island, 17–19 March 2004, pp869–881

    19. Hara T, Loh Y-H, Nishio S (2003) Data replication methodsbased on the stability of radio links in ad hoc networks. In:Proc. of international workshop on mobility in databases anddistributed systems (MDDS), pp 969–973

    20. Hayashi H, Hara T, Nishio S (2005) A replica allocationmethod adapting to topology changes in ad hoc networks.In: Proc. of international conference on database and expertsystems applications (DEXA)

    21. Jindal A, Psounis K (2006) Performance analysis of epidemicrouting under contention. In: Proc. of international con-ference on communications and mobile computing. ACM,New York, pp 539–544

    22. Kapadia S, Krishnamachari B, Ghandeharizadeh S (2006)Static replication strategies for content availability in vehicu-lar ad-hoc networks. Technical report, University of SouthernCalifornia

    23. Lochert C, Hartenstein H, Tian J, Fler H, Hermann D,Mauve M (2003) A routing strategy for vehicular ad hoc net-works in city environments. In: Proc. of the IEEE intelligentvehicles symposium, Columbus, June 2003, pp 156–161

    24. California Department of Transportation (2008) Caltrans.http://www.dot.ca.gov/

    25. Spyropoulous A, Psounis K, Raghavendra C (2004) Single-copy routing in intermittently connected mobile networks. In:Proc. of IEEE SECON

    26. Sun W, Yamaguchi H, Kusumoto S (2006) A study on perfor-mance evaluation of real-time data transmission on vehicularad hoc networks. In: Proc. of the 7th international conferenceon mobile data management (MDM), p 126

    27. Wolf J, Yu P, Shachnai H (1995) DASD dancing: a disk loadblanacing optimization scheme for video-on-demand com-puter. In: Proc. of ACM SIGMETRICS. May pp 157–166

    28. Zipf GK (1935) The psychobiology of language. Houghton-Mifflin, Boston

    Shyam Kapadia received his M.S. and Ph.D. degrees in Com-puter Science from the University of Southern California in2003 and 2007 respectively. Prior to that he received his B.E.in Computer Engineering from the Mumbai University, India in2000. He is currently a software engineer at the Internet and Ser-vices Business Unit at Cisco Systems Inc. His primary researchinterests include designing efficient data delivery mechanismsfor wireless ad-hoc and sensor networks including vehicular net-works.

    Bhaskar Krishnamachari received his B.E. in Electrical Engi-neering at The Cooper Union, New York, in 1998, and his M.S.and Ph.D. degrees from Cornell University in 1999 and 2002respectively. He is currently the Philip and Cayley MacDonaldEarly Career Chair Assistant Professor in the Department ofElectrical Engineering at the University of Southern California.His primary research interest is in the design and analysis ofefficient mechanisms for operating wireless sensor networks.

    http://doi.acm.org/10.1145/1023875.1023885http://www.dot.ca.gov/

  • Mobile Netw Appl

    Shahram Ghandeharizadeh received his Ph.D. degree in com-puter science from the University of Wisconsin, Madison, in

    1990. Since then, he has been on the faculty at the Universityof Southern California. In 1992, he received the National ScienceFoundation Young Investigator’s Award for his research on thephysical design of parallel database systems. In 1995, he receivedan award from the School of Engineering at USC in recognitionof his research activities. His primary research interests includedesign and implementation of multimedia storage managers,parallel database systems, and active databases. He has servedon the organizing committees of numerous conferences and wasthe general co-chair of ACM Multimedia 2000. His activities aresupported by several grants from the National Science Foun-dation, Department of Defense, Microsoft, BMC Software, andHewlett-Packard. He is the director of the database laboratory atUSC. He was a member of ACM SIGMOD executive committeeand the Editor-in-Chief of ACM SIGMOD DiSC from 2003 to2006. He also serves as a member of the Council of Directors ofKodaikanal International School.

    Static Replication Strategies for Content Availability in Vehicular Ad-hoc NetworksAbstractIntroductionRelated workGeneral frameworkModel preliminaries

    Data items with small size and long client trip durationAnalysis of data items with display time = oneSparse scenarioDense scenario

    Simulation results of data items with display time = oneVariation in car density

    Data items with small size and short client trip durationAnalysisSparse approximationDense approximation

    Simulation results

    Data items with large size and long client trip durationData items with large size and short client trip durationEvaluation with a map of San FranciscoResults with replication schemes

    Evaluation with real movement tracesUMassDieselNet tracesExperimental set-upResults

    Conclusions, discussion, and future research directionsReferences

    /ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 150 /GrayImageMinResolutionPolicy /Warning /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 150 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 600 /MonoImageMinResolutionPolicy /Warning /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 600 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False

    /Description >>> setdistillerparams> setpagedevice