[acm press the seventh acm workshop - raleigh, north carolina, usa (2012.10.15-2012.10.15)]...

12
Targeted and Scalable Information Dissemination in a Distributed Reputation Mechanism Rahim Delaviz Delft University of Technology The Netherlands [email protected] Johan A. Pouwelse Delft University of Technology The Netherlands [email protected] Dick H.J. Epema Delft University of Technology The Netherlands [email protected] ABSTRACT In online reputation mechanisms, providing the system par- ticipants (peers) with the appropriate information on pre- vious interactions is crucial for accurate reputation evalua- tions. A naive way of doing so is to provide all peers with all information, regardless of whether they need it or not, which may be very costly and not scalable. In this paper we propose a similarity-based approach, named SimilDis, for targeted dissemination of information in the distributed rep- utation mechanism called BarterCast. In BarterCast, each peer collects information on the interactions (data transfers) that have occurred in the system, and builds a weighted di- rected graph that represents its partial view of the system. We propose two methods to derive peer similarity in the partial graph of a peer. The first method is based on in- crementally maintaining a directed acyclic graph, and the second method is based on performing multiple nonuniform random walks in the partial graph. In both methods, each peer maintains a list of the peers most similar to itself, and gives higher priority to them when disseminating informa- tion. We evaluate the accuracy and the cost of these meth- ods using trace-driven simulations based on traces from the Tribler P2P file-sharing network, which employs BarterCast. As the results show, both methods exhibit very small errors in the computed reputations in comparison with the case of providing complete knowledge to all peers, but decrease the communication and storage costs by two orders of magni- tude. Categories and Subject Descriptors H.3.4 [INFORMATION STORAGE & RETRIEVAL]: selective dissemination of information–SDI—Distributed sys- tems Keywords reputation mechanisms, distributed systems, scalable dis- semination Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specic permission and/or a fee. STC’12, October 15, 2012, Raleigh, North Carolina, USA. Copyright 2012 ACM 978-1-4503-1662-0/12/10 ...$15.00. 1. INTRODUCTION Providing efficient reputation management mechanisms at scale is an important step to provide trust in many dis- tributed systems, like file sharing systems. A typical online reputation mechanism is composed of three main compo- nents: Formulation, Calculation, and Dissemination [15]. The dissemination component provides the other compo- nents with the required information to operate. More specif- ically, in reputation mechanisms in which the calculation component uses information from other participants (peers) on interactions in the system as input, peers will not be able to evaluate accurate reputations without an effective spread of this information. From the point of view of reputation ac- curacy, providing peers with more information is preferred, but from the point of view of scalability, uncontrolled and blind dissemination can be problematic in terms of com- munication, computation, and storage costs. This paper deals with this trade-off in large-scale distributed reputation systems by providing a scalable dissemination method in the BarterCast reputation mechanism [27] of the Tribler[29] peer-to-peer (P2P) file-sharing system. In BarterCast, rep- utation values are used by a peer to decide whether or not to upload to a peer requesting data from it. In online reputation mechanisms, information dissemina- tion spans the spectrum from zero to full dissemination, see Figure 1. With zero dissemination, participants only use their own direct experiences for evaluating reputations, and no information on interactions is spread. Such a mecha- nism works if the participants interact frequently with each other, and if having only the direct interactions is enough to have an accurate prediction about a counter party’s future behavior. At the other extreme of the spectrum is full dis- semination, where all participants receive information of all previous interactions. Although from the accuracy point of Figure 1: The information dissemination spectrum and the accuracy and scalability curves. 55

Upload: dick

Post on 11-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Targeted and Scalable Information Disseminationin a Distributed Reputation Mechanism

Rahim DelavizDelft University of Technology

The [email protected]

Johan A. PouwelseDelft University of Technology

The [email protected]

Dick H.J. EpemaDelft University of Technology

The [email protected]

ABSTRACTIn online reputation mechanisms, providing the system par-ticipants (peers) with the appropriate information on pre-vious interactions is crucial for accurate reputation evalua-tions. A naive way of doing so is to provide all peers withall information, regardless of whether they need it or not,which may be very costly and not scalable. In this paperwe propose a similarity-based approach, named SimilDis, fortargeted dissemination of information in the distributed rep-utation mechanism called BarterCast. In BarterCast, eachpeer collects information on the interactions (data transfers)that have occurred in the system, and builds a weighted di-rected graph that represents its partial view of the system.We propose two methods to derive peer similarity in thepartial graph of a peer. The first method is based on in-crementally maintaining a directed acyclic graph, and thesecond method is based on performing multiple nonuniformrandom walks in the partial graph. In both methods, eachpeer maintains a list of the peers most similar to itself, andgives higher priority to them when disseminating informa-tion. We evaluate the accuracy and the cost of these meth-ods using trace-driven simulations based on traces from theTribler P2P file-sharing network, which employs BarterCast.As the results show, both methods exhibit very small errorsin the computed reputations in comparison with the case ofproviding complete knowledge to all peers, but decrease thecommunication and storage costs by two orders of magni-tude.

Categories and Subject DescriptorsH.3.4 [INFORMATION STORAGE & RETRIEVAL]:selective dissemination of information–SDI—Distributed sys-tems

Keywordsreputation mechanisms, distributed systems, scalable dis-semination

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.STC’12, October 15, 2012, Raleigh, North Carolina, USA.Copyright 2012 ACM 978-1-4503-1662-0/12/10 ...$15.00.

1. INTRODUCTIONProviding efficient reputation management mechanisms at

scale is an important step to provide trust in many dis-tributed systems, like file sharing systems. A typical onlinereputation mechanism is composed of three main compo-nents: Formulation, Calculation, and Dissemination [15].The dissemination component provides the other compo-nents with the required information to operate. More specif-ically, in reputation mechanisms in which the calculationcomponent uses information from other participants (peers)on interactions in the system as input, peers will not be ableto evaluate accurate reputations without an effective spreadof this information. From the point of view of reputation ac-curacy, providing peers with more information is preferred,but from the point of view of scalability, uncontrolled andblind dissemination can be problematic in terms of com-munication, computation, and storage costs. This paperdeals with this trade-off in large-scale distributed reputationsystems by providing a scalable dissemination method inthe BarterCast reputation mechanism [27] of the Tribler[29]peer-to-peer (P2P) file-sharing system. In BarterCast, rep-utation values are used by a peer to decide whether or notto upload to a peer requesting data from it.

In online reputation mechanisms, information dissemina-tion spans the spectrum from zero to full dissemination, seeFigure 1. With zero dissemination, participants only usetheir own direct experiences for evaluating reputations, andno information on interactions is spread. Such a mecha-nism works if the participants interact frequently with eachother, and if having only the direct interactions is enough tohave an accurate prediction about a counter party’s futurebehavior. At the other extreme of the spectrum is full dis-semination, where all participants receive information of allprevious interactions. Although from the accuracy point of

Figure 1: The information dissemination spectrumand the accuracy and scalability curves.

55

view this is desirable, full dissemination does not scale, andmay even be unnecessary.

In large-scale online systems such as P2P file-sharing sys-tems, peers will only interact with a subset of all peers, forinstance, those peers who have similar tastes with respectto the content available in the system. Providing peers withinformation on all peers and interactions is then very ineffi-cient. Rather, information dissemination targeted at similarpeers may then be sufficient and much more efficient, whichespecially is important for power, memory, and computa-tion constrained mobile devices. BarterCast [27] is a dis-tributed reputation mechanism based on an epidemic pro-tocol that is used in the BitTorent-based file-sharing clientTribler [29]. In BarterCast, peers build a partial graph withpeers as nodes and interactions they have learnt about asedges. It has been shown [7] that performing full dissemina-tion about all interactions between peers improves Barter-Cast’s accuracy. However, this full dissemination approachincurs high operational costs. In this paper, we propose anew low-cost dissemination mechanism for BarterCast calledSimilDis, which without providing a full view to all peersleads to highly accurate reputation evaluations.

In SimilDis, we use either of two methods to compute peersimilarity values, one of which is deterministic and of whichis non-deterministic. In the first, introduced in this paper,each peer builds a labeled similarity graph, which is a di-rected acyclic graph (DAG), from its partial graph in whichthe labels indicate the similarities with the local peer. Thesecond method is based on doing multiple non-uniform ran-dom walks (RW) in a peer’s partial graph; then, the numberof times a node is visited is a measure of its similarity to thelocal peer. This method was already used in [35, 36]. Bothmethods are solely based on a peer’s local information. Inorder to evaluate SimilDis, we simulate it using traces fromthe Tribler network, and we assess its accuracy in evaluatingreputations and the incurred communication, computation,and storage costs. The results show that SimilDis, comparedwith full dissemination, yields very low reputation evalua-tion errors, and causes the communication costs and the av-erage size of the partial graphs to be reduced by two ordersof magnitude.

As the paper outline, in Section 2 we give a general overviewof our similarity-based dissemination protocol. After relatedwork in Section 3, we present the design details in Section 4.In Section 5 we show how in the DAG-based method similar-ity values are dynamically updated. The experiment setupand results are covered in Sections 6 and 7, respectively.

2. GENERAL OVERVIEWIn this section, first we give a short introduction to Barter-

Cast reputation mechanism, then we give a general overviewof the SimilDis protocol.

2.1 The BarterCast MechanismThere are two types of reciprocity mechanisms in dis-

tributed file sharing systems: direct and indirect. In directreciprocity, like tit-for-tat, upload bandwidth is exchangedfor download bandwidth. Due to problems like free-ridingand a lack of seeding [24], researchers have introduced in-direct reciprocity mechanisms, where a contributing peer isrewarded by other peers in the network but no direct com-pensation is expected. The BarterCast mechanism belongsto this second class of mechanisms, and it is used by the Tri-

bler BitTorent client to rank peers according to their uploadand download behavior. In this mechanism, a peer whoseupload is much higher than its download gets a high rep-utation, and other peers give a higher priority to it whenselecting a content bartering partner. In BarterCast, whentwo peers exchange content, they both log the cumulativeamount of transferred data since the first data exchangeand the identity of the corresponding peer in a BarterCastrecord. In Tribler, peers regularly contact other peers inorder to exchange BarterCast records. Peer sampling for se-lecting to whom to send BarterCast records is done througha gossip protocol called BuddyCast [30].

From the BarterCast records it receives, each peer createsits own current local view of the upload and download ac-tivity in the system by gradually building its partial graph.The partial graph of peer i is the weighted directed graphGi = (Vi, Ei), where Vi is the set of peers whose activitiespeer i has been informed about through BarterCast records,and Ei is the set of edges (u, v, w), with u, v ∈ Vi and withw the weight representing the total amount of data trans-ferred from u to v. Upon receipt of a BarterCast record(u, v, w), peer i either adds a new edge to Gi if it did notknow u and/or v, or updates the weight of the edge u → vif it already exists in Gi.

In order to calculate the reputation of an arbitrary peerj ∈ Vi at some time, peer i applies the maxflow algorithm[6] to its current partial graph to find the maximal flow fromitself to j and vice versa. Maxflow is a classic algorithm ingraph theory for finding the maximal flow from a sourceto a destination node in a weighted graph. When applyingmaxflow to the partial graph, we interpret the weights ofthe edges, which represent amounts of data transferred, asflows. The original maxflow algorithm by Ford-Fulkerson [6]tries all possible paths from the source to the destination,but in BarterCast only paths of length at most 2 or 4 areconsidered. If Φh(x, y) is the h-hops maxflow from x to y,then the non-negative subjective reputation of peer j frompeer i’s point of view is calculated as:

Ri(j) =arctan(Φh(j, i))

π/2× (1 − arctan(Φh(i, j))

π/2), (1)

and so Ri(j) ∈ [0, 1). If the destination node j is more thanh hops away from i, then its reputation is zero.

The security aspects of BarterCast have been studied bySeuken et al. [34], where they have looked at data transferactions as provided work by peers, and they have consideredBarterCast as a distributed accounting mechanism. Consid-ering the security requirements of BarterCast, we introduceda modified version of it which is resilient against sybil-attackand white-washing [8]. In this paper we use the same for-mulation from the modified version.

2.2 SimilDis OverviewTo have a scalable and accurate reputation mechanism

and to provide the right information to the right peers, theselection of peers for sending records should be done care-fully. One way to realize this goal is to use a similar tech-nique to semantic clustering in distributed search mecha-nisms [23, 5]. In such a technique, based on a kind of se-mantic similarity, peers are clustered in a number of groups,and when a peer initiates a query, first it sends it to its groupmembers, and only if the reply is not satisfying it asks out-

56

siders. A similar technique to semantic searching can beused in the spread of interactions in reputation mechanismsas well.

The BarterCast mechanism can be decomposed into thethree components of dissemination, formulation, and calcu-lation. The role of the dissemination component is to gatherand provide the other components with the new Barter-Cast records that have been spread in the network. In thenew mechanism, the dissemination component is replacedby SimilDis and it differs from the previous disseminationcomponent in two significant ways.

First, instead of 1-hop dissemination, peers are allowedto send the received records to other peers in the network.In current BarterCast, to avoid misreporting, peers only areallowed to spread records about their own direct interac-tions with other peers. In other words, if p uploads to qthen only p and q can inform other peers about this action,but the other peers are not allowed to disseminate it. Thisrestriction limits the record reachability and decreases thereputation accuracy calculated by peers [7]. In SimilDis,to solve this problem, we allow peers to send the receivedrecords to other peers in their partial graphs. To preventmisreporting, instead of initiating plain records, the peerswho are involved in a data transfer action sign the recordwith their private keys. With signed records, no one cantamper with and change the record content. Allowing peersto send the received records can increase the disseminationlevel, but it will also increase the communication, storage,and computation costs. These issues are addressed by tar-geted dissemination.

Secondly, in gossiping protocols choosing a right set of ru-mor receivers is crucial in building a desired overlay [17], andon the overall efficiency of the protocol [11]. Based on thisidea, in SimilDis, using the partial graph Gp of peer p, wederive a similarity measure between p and the other peers inGp. Using this similarity metric, peers who are similar to pget a higher priority to be chosen as the record receivers dur-ing the dissemination of BarterCast records. With this mod-ification, the partial views of peers are concentrated aroundsimilar peers and only records that are of value are dissem-inated and kept by each peer.

In summary, SimilDis operates as follows. Besides its par-tial graph, each peer builds and maintains a limited-length,ordered similarity list. When sending a record, it selectsa set of random peers from its similarity list as the recordreceivers. The details of the similarity computation and up-date processes are explained in Section 4.

3. RELATEDWORK & BACKGROUNDSTUDY

Three areas of research are very relevant to the topic ofthis paper, which are gossip protocols, information dissemi-nation in reputation and trust systems, and node similarityin graphs. In this section we give a review for each area.

3.1 Gossip ProtocolsSince their introduction for database synchronization [9],

gossiping protocols have found various applications such asmembership management [2], aggregation [18], multicasting[13], and information dissemination [11]. Gossiping proto-cols consist of three elements: select-partner (whom to senda message to), select-to-send (what to send), and select-

to-keep (what to keep from the information received) [33].The select-partner element plays a key role in the formationof the topology, induced by the protocol, and its effective-ness. Regarding information dissemination, gossiping pro-tocols have a number of desirable properties like resilienceto failures, fast convergence, load balancing, and high scal-ability, which make them suitable for distributed systems.

3.2 Information Dissemination in Reputationand Trust Systems

As mentioned in the introduction, providing reputationevaluators with the right set of information is crucial foraccurate evaluation. Hoffman et al. [15] have studied rep-utation systems from various dimensions and have definedfour aspects for their dissemination component: dissemina-tion structure, dissemination approach, dissemination dura-bility, and level of redundancy. The dissemination structurespecifies whether the information is collected and dissemi-nated in a centralized fashion, like in eBay, or in a decen-tralized way, like in Credence [37] and EigenTrust [20]. Thedissemination approach categorizes systems as determinis-tic or probabilistic. Deterministic approaches usually arebased on a hierarchical structure [10] or they use DHT, asEigenTrust. Dissemination durability is mostly a matterof implementation, but in general there are two types ofthem, permanent storage systems which keep informationfor a long period, like EigenTrust and Credence, and volatileor short-term storage mechanisms, like ARA [14]. Finally,the redundancy aspect relates to the degree of informationredundancy, and involves a tradeoff between scalability andreliability. Considering these aspects, we categorize Barter-Cast as distributed, probabilistic, long-term storage, andredundant.

In computer science literature, the terms reputation andtrust are closely related to each other, and sometimes theyare used equivalently. Briefly, trust refers to a subjectiveopinion about an entity which is less general than reputa-tion [12]. Trust mechanisms have been widely studied invarious domains as multi-agent systems, P2P networks, ad-hoc networks, wireless sensors networks, and dozens of trustprotocols have been proposed [12]. Despite their diversity,like in reputation systems, effective dissemination of behav-ioral information is a vital requirement for doing a mean-ingful trust inference [26]. Specially, in mobile and sensornetworks, due to power and computation limitations, properbuilt of web of trust (the trust network) is critical for thescalable operation of the system [31]. The proposed methodin this paper, for targeted dissemination, is easily applicablein this area as well.

3.3 Node SimilarityDue to the high volume of generated information and the

need to filter and categorize them, similarity measures havegained a lot of interest in the online world, and they arewidely used in recommender and collaborative filtering sys-tems [4, 1, 32]. Various types of similarity measures havebeen introduced. If entities and their relations are trans-formed into a graph, then we can define a new kind of simi-larity, called structural similarity [21], which is simply basedon the connections between nodes in the graph. The basicpremise of structural similarity is that the structure of anetwork reflects real information about the nodes.

In the network literature, researchers have proposed var-

57

ious approaches to quantify structural similarity. One ofthe earliest approaches is called structural equivalence [25].Here, the more neighbors two nodes have in common, themore similar they are. Later, Jeh et al. [16] proposed Sim-Rank, which is predicated on the idea that two nodes aresimilar if their neighbors are similar. Despite its elegance,SimRank has a number of drawbacks: nodes that are at anodd distance from each other have a similarity of zero, theedge weights are omitted from the similarity measure, andwith any change in the graph all similarities have to be recal-culated. Besides, SimRank calculates the similarity betweenevery pair of nodes, which in some applications is unneces-sary. These drawbacks limit the applicability of SimRankin our problem.

Antonellis et al. [3] proposed an extended version of Sim-Rank, called SimRank++, which for similarity calculationtakes into account the edge weights and an external simi-larity measure called evidence. Except for using the edgeweights, SimRank++ still suffers from the other mentioneddrawbacks of SimRank . Considering the static and iterativenature of SimRank, Li et al. [22] proposed an incrementalversion of SimRank.

Based on the idea of regular equivalence (nodes are similarif they are connected to similar nodes), Leicht et al. [21] pro-posed a linear algebric method for calculating node similar-ity. Their fundamental assumption is that an edge betweentwo nodes indicates a similarity between them (similar as inBarterCast). Unlike SimRank, this method considers bothodd and even length paths, but it is still static, computes allpairwise similarities (which is unnecessary in our case), anddoes not consider edge weights. In view of the limitations ofthe mentioned similarity methods and our specific require-ments for targeted dissemination, we devise and apply oursimilarity methods, see Section 4.

4. DESIGN DETAILSIn SimilDis, the partial graph of a peer is used for two

purposes: reputation calculation and similarity computa-tion. Using its partial graph a peer builds a list of simi-lar peers to itself, and when disseminating information thetarget nodes are chosen from this list. In this section, weexplain the process of similarity computation.

4.1 Peer Similarity RequirementsUsually in distributed search techniques similarity is de-

rived from a predefined user-item matrix, from which onecan infer a similarity measure between users or betweenitems [23]. Even though we do not have such fine-graineddata in the partial graphs, still an edge between two peersdoes show their common interest in the same content, whichcan be used in the similarity computation process. Con-sidering the operational requirements and the properties ofpartial graphs, we list the following desirable features for thesimilarity metric in SimilDis:

• An edge between two nodes is a sign of similarity be-tween them and should be accounted for in the simi-larity calculation.

• The edge weights should be considered in the similaritycalculation.

• The similarity between two nodes decreases when thedistance between them increases.

• As the partial graph is growing (new nodes or edgesare added and the weights of existing edges change),the similarity values should be updated dynamically.

• A peer only needs to maintain the similarities betweenitself and other peers in its partial graph.

• Only the relative similarity (ranks) of peers is impor-tant.

Based on these requirements we devise two algorithms tocompute similarity, one is based on using a Directed AcyclicGraph (DAG) derived from the partial graph, and other isbased on multiple random walks in the partial graph of apeer. We use these methods since both conform to the men-tioned relaxed similarity requirements, and, in our context,they are more efficient in computing similarity values thanthe existing solutions cited in Section 3.

4.2 DAG-based SimilarityFrom the point of view of similarity, the direction of a

data transfer is not important. Therefore, when in the par-tial graph Gp there are two directed edges u → v and v → ubetween nodes u and v, with weights wuv and wvu, respec-tively, we replace these two edges by a single undirected edgeuv with weight wuv + wvu. The new undirected graph cre-ated in this way is denoted by Up. This graph is not usedfor reputation calculation; for this we still use the partialgraph itself, so free-riders will not benefit from the higheredge weights in Up.

Starting from the node p in Up, a new labeled wieghtedDAG Sp is generated, where the label of each node shows itssimilarity to p. Initially, the graph Sp only contains node pwith label sp = 1.0 and level lp = 0. The label 1.0 showsthe maximum similarity of p to itself, and the level of a nodeis the distance of the node to the source node p. For eachneighbor q of p in Up, a new edge p → q is added to Sp

and the level of q is set to one higher than the level of p,so lq = lp + 1. An edge like p → q induces a parent-childrelation between p and q. The weight Wpq of the edge p → qin Sp is obtained by relative splitting of the similarity of pamong all its children:

Wpq =wpqP

i∈N+p

wpi× sp, (2)

where sp is the label of p, N+p is the set of children of p,

and wpq is the weight of the edge pq in Up. This processof adding nodes and edges to Sp continues with the grandchildren of p until all the nodes in the connected componentof Up that p belongs to have been added to Sp.

Starting with the source node p in Sp, using the similarityvalue of a parent node and its outgoing edge weights, we areable to calculate the similarity values of its children. Thesimilarity of a node q to the source node p is equal to thesum of the weights of its incoming edges in Sp multiplied bya decay factor:

sq = θlq ×X

i∈N−q

Wiq, (3)

where lq is the level of q, θ is a predefined decay factor in(0, 1], and N−

q is the set of parents of q in Sp. Due to the

factor θlq , by going further away from the source node, thesimilarities of the nodes to the source node decrease.

58

Using the above procedure, the graph Sp is built up levelby level, but an ambiguity arises when two nodes u and vwith the same level have a common edge in Up. In such asituation it is not clear whether v is a child of u or the otherway around, and which of the edges u → v or v → u shouldbe added to Sp. If both are added, the graph Sp will not beacyclic, but if no edge is added, we lose valuable similarity-related information. We deal with this issue by having thenodes u and v exchange a fraction of their similarities andby further ignoring the edge uv in the calculation of thesimilarities of the lower level nodes. By this strategy, theacyclic property of Sp is preserved, and still the edge uvinfluences on the similarities of u, v and their children.

To calculate the amount of similarity exchange, first theedge uv is temporarily replaced by a dummy node duv andtwo edges from u and v to duv in Sp. This replacement isdone for all the edges between nodes at the same level as uand v. Then the nodes u and v compute amounts ηu(uv) andηv(uv) from their similarity and transfer them to duv, whichthen has similarity ηu(uv) + ηv(uv). Finally, this value isequally split between the nodes u and v, and so the changein the similarity of u will be:

Δu(uv) =ηv(uv) − ηu(uv)

2, (4)

and for the node v it is Δv(uv) = −Δv(uv). After processingall the edges ux with lu = lx, the new similarity value of uwill be su +

PΔu(ux).

In the calculation of ηu(uv) and ηv(uv), the nodes u andv only are allowed to play with a portion of their similaritybut not with the whole—we call this limitation parental al-lowance. Parental allowance depends on the strength of theconnections to the parents, the stronger the connection, thesmaller the fraction of its similarity a node is allowed to giveto a dummy node and vice versa. Without the parental al-lowance, a node highly similar to its parents may lose muchof its similarity and may become very little similar to itsparents. To calculate the parental allowance, if Πu andΠv are the sum of the weights of the edges connecting uand v to their parents, respectively, then the parental al-lowances of u and v will be πu = 1 − Πu/(Πu + Πv) andπv = 1 − Πv/(Πu + Πv). The similarity transferred to thedummy node by peer u is now:

ηu(uv) = θlu+1 × πu × ρu , (5)

where ρu is the ratio of wuv/2 to the sum of all the edgesconnecting u to its children (including the dummy nodes).

The metaphor for this way of exchanging similarity isthat because of the parental allowance, children that arestrongly connected to their parents have less freedom in giv-ing their similarity to others, and loosely connected childrenhave more freedom, which is natural. At the child level,the dummy node is treated like a lost child with its asset(similarity) equally divided between the parents.

As an example, now we will go through the process ofsimilarity computation for a simple partial graph. Figure2(a) shows the partial graph of p and Figure 2(b) shows theundirected graph Up generated from it. In Up, the nodes rand q are located at the same distance from the node p andthey are connected, so in the graph Sp the edge rq will bereplaced by a dummy node and two edges. Figure 3 showsthe generated DAG Sp along with the similarity of each nodeshown as a label beside it. For this example, θ = 0.8, and

(a) Partial graph Gp (b) Undirected graph Up

Figure 2: A sample partial graph and the associatedundirected graph.

the nodes r and s have the highest and lowest similarity top, with sr = 0.54 and ss = 0.07.

4.3 RandomWalk Based SimilarityAs a second method for computing similarity values we

consider random walks, which have been used to computesimilarity previously [35, 36]. In this method, starting fromthe node p we perform multiple biased random walks oflength L in the partial graph Gp. These walks are non-uniform, and the choice of the next node from an arbitrarypeer u is proportional to the weights of the outgoing edges ofu. Besides, there is a transport factor α < 1.0, which helpsthe walker to come back to the start point. When choos-ing the next node, with probability α the walker jumps tothe start point p and continues the walk from there. Afterhaving performed a walk K times, the ratio of the numberof times a node is visited to the total number of visits toall nodes (K × L) is taken as the similarity of that node top. As already mentioned in Section 4, an edge between twonodes is a sign of their interest in the same content, and thehigher the edge weight, the higher the similarity betweenthem. In a biased random walk, high-weight edges get ahigher chance to be walked, and so the attaching nodes arevisited more often than those connected weakly; accordingly,the hitting times of a node correlate with its similarity tothe node who does the random walk. A good property ofthis method is that it its complexity only depends on K andL and is independent of the size of the partial graph.

Even though the RW-based method is not dynamic, inthe face of changes in the partial graph we can use severalheuristics to avoid recalculating the similarities for everysuch change and still have accurate similarity values. Fromthe similarity point of view, the closer to the source node p,the more important are the edges. Until 2 hops the similarityupdate probability is the inverse of the distance from the

Figure 3: The similarity graph Sp for the exampleof Figure 2(a).

59

local node p. So, if p does an upload or a download action,then the similarity values are recalculated, but if a neighborof p does such an action then only with a probability of 0.5the similarity calculation process is re-run. For the actions ofother nodes, the similarities are only updated if the numberof non-processed actions passes a threshold value updatec. Itis possible that a peer may receive less than updatec updatesfor a long period of time, so that the update trigger doesnot activate during this period. To mitigate this problem,we define a time-based similarity update trigger that afterupdatet time of updating the partial graph (with at leastone record) the similarity calculation process is re-run too.

4.4 Similarity Maintenance & SecurityIn both the DAG-based and the Random-Walk based meth-

ods, a peer p builds and maintains a similarity list of maxi-mum size m with the top m most similar peers to itself, andin the selection of a target node for disseminating a recordthe peer p refers to this list.

When a peer p joins the network, its partial graph andsimilarity list are empty, but by doing its first upload ordownload it creates its first connections in its partial graph,and accordingly it gets new items in its similarity list. Lateron, by receiving new records it can update its similaritygraph and similarity list. If the similarity list is full then theleast similar peer is replaced by a fresher and higher similarpeer. When a peer p receives a record, it first updates itspartial graph Gp, then its similarity graph Sp, and finallyits similarity list.

Regarding the security concerns, SimilDis carries securitymechanisms against malicious acts like misreporting, sybil-attack, and white-washing. First of all, since the recordsare double-signed, there is no opportunity for misreporting.Second, the reputation calculation is done as in the sybil-resilient version of BarterCast [8], which is independent ofhow the records are disseminated. The only remaining con-cern is biasing the partial graph of a peer by a group of ma-licious peers, where they try to boost their own reputationsat that peer. But this attack strategy is like the sybil-attackand the same sybil defense mechanism is effective here too.

5. DYNAMIC SIMILARITYUPDATEALGO-RITHMFORTHEDAG-BASEDMETHOD

In the dynamic network in which SimilDis is supposed tobe executed, new peers may join the network or existingpeers may perform data transfers, causing partial graphs tochange. In turn, a change in the partial graph of a nodemay cause the similarity graph to change as well, and as aconsequence, it may affect the similarity list. Creating thesimilarity graph from scratch for every change of the partialgraph is not very efficient. In this section we will present adynamic update algorithm for the similarity graph when thepartial graph changes. In this algorithm, we use the naturalpartial ordering ( ≤ ) property of a directed acyclic graphon its nodes, where u ≤ v if there is a path from u to v.Due to this property, a change in the similarity of a node uonly affects the similarities of the nodes reachable from u.This property enables us to devise an incremental methodfor updating the similarity values.

Consider a node p, and its similarity graph Sp = (Vs, Es),where Vs and Es are the node and edge sets, respectively.In SimilDis, we do not store the undirected graph Up (see

Section 4.2); it was only introduced to aid the explanation,In the real implementation we have used two adjacency liststo keep the graph structures, one for in-neighbors and onefor out-neighbors, and the undirected weights are computedon the fly using the partial graph itself. Initially, the graphGp is empty and Sp only contains the node p itself, which iscalled the root of DAG. Suppose that peer p wants to updateits similarity graph Sp after it has updated its partial graphGp with the newly received record (u, v, W ), indicating thatpeer u has uploaded an amount W of data to v. Then thereare four possible scenarios:

1. u /∈ Vs and v /∈ Vs.

2. u ∈ Vs, v ∈ Vs, and u → v ∈ Es or v → u ∈ Es (bothedges can not coexist).

3. u ∈ Vs or v ∈ Vs, but not both.

4. u ∈ Vs, v ∈ Vs, but u → v /∈ Es and v → u /∈ Es.

Note that the graph Sp is connected and that a node q be-longs to it if and only if there is a path from p to q in Gp.

We now explain how the similarity graph is updated ineach case.

Scenario 1): There is no path from p to u or to v in Gp,and u and v are not able to join Sp, so Sp does not change.

Scenario 2): Without loss of generality assume that theexisting edge is u → v. Due to the new data transfer betweenu and v, the weight of this edge is changed in Es, and sothe similarity of all the nodes reachable from u should beadapted as well. In this scenario, the graph structure Sp

and nodes levels do not change. To update the similarityvalues, we start from the node u in Sp and using Eqs. (2)and (3), we recalculate the edge weights and the similaritiesof the children of u. This process continues throughout thecomplete subtree Sp of which u is the root.

Scenario 3): Without loss of generality assume thatu ∈ Vs and v /∈ Vs (the direction of the edges in Gp areirrelevant in the construction of Sp). This means that thereexists a path from the root node p to u in Gp but not tov, Figure 4(a). In this scenario, the new edge u → v in Sp

will act as a bridge that connects v and all nodes reachablefrom v in Gp to the similarity graph Sp. To modify Sp,before adding the edge u → v to Sp, our update algorithmtreats the component of Gp that v belongs to as a stan-dalone graph, and by starting from v it creates a sub-DAGfor this component. In Figure 4(b), this sub-DAG is com-posed of the nodes v, t, and z. This new sub-DAG is thenjoined to the main similarity graph by the edge u → v, andlv = lu + 1. After this join operation, the situation becomeslike scenario 2, and starting from the node u the similarityvalues are updated accordingly.

Scenario 4): This is the most complex scenario and un-like the previous scenarios, the levels of nodes that are al-ready present in Sp may change. As in scenario 3, first thestructure of the graph is adapted, then the similarity valuesfor the changing nodes are recalculated. In this scenario, thecurrent levels of the nodes u and v dictate how the graph isgoing to be restructured. There are three possibilities:

• lu − lv = 0. In this case the levels of u and v do notchange, but to reflect the impact of the new edge onthe similarity graph, using a dummy node duv, u andv amend their similarity according to Eq. (5). Then

60

(a) Gp and Sp before adding the edgeu → v.

(b) Gp and Sp after adding the edge u → v.

Figure 4: An example for scenario 3, the edge u → vconnects the node p to the new part of the graph.

like in the scenario 2, starting from the nodes u and v,the similarity values of the nodes reachable from thesenodes are updated, Figure 5 presents an example.

• lu−lv = −1. The node u is one hop closer to p than thenode v, and the new edge does not change the graphstructure nor the node levels. In this case only anedge is added from u to v in Sp, then like in scenario 2starting from node u the similarity values are updated.

• lu − lv < −1. Like a domino effect, the new edgeu → v causes level changes in the children and parentsof v, and the changes ripple until the point that thelevels of the nodes do not change any more. Unlike theprevious cases, here the direction of an existing edge inSp may change. The pseudocode of the graph rewiringalgorithm for this scenario is presented as Algorithm1. Here, the queue Q contains the nodes of which thelevels are changed, and the node v with the new levellv = lu + 1 is its first item. The algorithm continuesby removing an item from Q and processing it, until itbecomes empty.

Algorithm 1 (Scenario 4 & lu − lv < −1).

1: Q ← {v}2: while Q �= ∅ do3: x ← remove(Q)4: for m ∈ children(x) do5: if with the new edge the level of m changes then6: update lm7: if it is required remove an existing dummy node8: if it is required add a new dummy node9: update the connecting edges to m

10: add m to Q11: end if12: end for13: for n ∈ parents(x) do14: if with the new edge the level of n changes then15: update ln16: if it is required change the direction from x to n17: if it is required add a dummy node, dxn

18: add n to Q19: end if20: end for21: end while

(a) Gp and Sp before adding the edgeu → v.

(b) Gp and Sp after adding the edgeu → v.

Figure 5: An example for scenario 4 when the levelsof u and v are equal.

The conditions for adding/removing a dummy node, or chang-ing the direction of an existing edge depends on the changesin the node levels. Similar to other scenarios, after rewiringSp, by traversing it from node u, the edge weights and sim-ilarity values are updated.

Figure 6 shows an example, where due to the new edgeu → v, the similarity graph needs to be rewired. In this ex-ample, by applying Algorithm 1 the following changes hap-pen:

• The levels of the parent and child of u (t and m)change.

• The parent-child relation between t and v is reversed.

• The nodes s and t get a dummy child.

For this example, only the similarities of the nodes r, v, m, t, s, nneed to be recalculated.

Figure 6: An example for scenario 4 whenlu − lv < −1; similarity graph before (left), and after(right) adding the edge u → v.

The complexity of the dynamic update algorithm dependson the scenario. For scenario 1, since no graph traverse isdone, the complexity is O(1). For the other scenarios, to up-date the similarity values, starting with the higher level node

61

(see the scenarios), we traverse the sub-graph in breadth-first-search (BFS) manner, so if |V ′| and |E′| are the num-bers of nodes and edges in the traversed subgraph, then thecomplexity of the update process is O(|V ′|+ |E′|). The sizeof a traversed subgraph depends on the structure and thegrowing pattern of the partial graph, and in the worst caseit is equal to the DAG graph Sp.

6. EXPERIMENTAL SETUPWe perform trace-driven simulations to evaluate our pro-

tocol. This section covers the simulation steps in detail.

6.1 SimilDis SimulationOur experiments for evaluating SimilDis are based on a

trace obtained from the Tribler network. Using a timed-ordered list of data transfer actions, we simulate the creationand dissemination of data transfer actions through SimilDis.In order to evaluate its accuracy, we compare the reputationvalues calculated using SimilDis with the case of having fullknowledge (all records are given to all peers). The simula-tion is run in two phases: the training phase and the testingphase, and accordingly, the trace is split into two parts, onepart for each phase. After processing 50% of the trace in thetraining phase, in which only dissemination is performed andthe partial graphs are built up, in the testing phase peers areasked to evaluate the reputations of the peers they uploadto.

In the Tribler network, a data transfer action is repre-sented as a tuple (p, q, U, D, t), which indicates that untiltime t, peer p has uploaded an amount of data U to anddownloaded an amount of data D from q. We sort the datatransfer actions based on their dissemination time t in thenetwork (the real data transfer time is unknown). Since thecrawler may receive the same record from multiple peers,for our experiment we only keep the first occurrence of arecord in the network. We filter out duplicate records, sin-gleton nodes (nodes that are not connected to any othernode), and the records in which U and D is less than 256KB. In the final data trace that is fed to the simulator, eachBarterCast record (p, q, U, D, t) is replaced by two separaterecords (p, q, U) and (q, p, D). Since it is not clear whichof these actions has happened first, we just randomly putone before the other. Since multiple experiments with dif-ferent ordering showed no meaningful effect on the outcome,we proceeded with a single random ordering of the records.Also, since the BarterCast records are processed by time andthe simulator reads the trace sequentially, in the final tracethe time t is irrelevant. After applying the mentioned filterswe end up with a network of 11.7 K nodes and 28.1 K edges.

To simulate the record dissemination, we modify the PeerSimsimulator [19], and implement SimilDis as a set of new mod-ules over PeerSim. The simulation starts with an emptynetwork, and by processing the trace, new nodes and edgesare added to the network. Each peer keeps its own local par-tial graph, its similarity graph, and its similarity list, andwhen receiving a new record, it updates these structures ac-cordingly. Here are the main steps that are done in eachsimulation cycle:

1. Reading Trace: In each simulation cycle, the simulatorreads nrec new records from the trace and injects theminto the network. In our experiments nrec is set to 20.To imitate reality, the peers who are involved in a data

transfer action are the only receivers of a newly readrecord from the trace. In other words, if the simula-tor reads the record (p, q, U), then this record is onlygiven to p and q. Later on, in the record sending step,they inform other peers about this record. The tracereading step is performed by the simulator, but thefollowing steps are run by every peer in each cycle.

2. Evaluating Reputation: This step is done only duringthe testing phase, where for each record (p, q, U), theuploading peer p evaluates the reputation of q. Thereputation evaluation is done before the update of thepartial graph of p with (p, q, U).

3. Updating Similarity List: For each received record (r, s, U),the peer p first updates its partial graph Gp, then itssimilarity graph Sp (in the DAG-based method), andfinally its similarity list.

4. Sending and Receiving Records: The real dissemina-tion happens in this step, and peers actively involve inthe spreading of the received records. Each peer hasa buffer of size of lbuf which contains the candidateoutgoing records. If the buffer is full then a newly re-ceived record replaces the oldest one. Also, each peersends a record at most trec times, which is called themaximum send-age of a record. In each cycle, a peerforwards a maximum number of nmsg messages to aset of peers of size fout (the fan-out) that consists ofthe top |fout| most similar peers to p as derived fromp’s similarity list.

6.2 Full-Dissemination EmulationBecause the views of the peers in SimilDis are only par-

tial, the subjective reputation of a peer may differ from oneevaluating peer to another. The ground truth for reputa-tion values is obtained when peers have immediate access toall the past interactions in the network. The full graph isthe ideal situation, and in effect it is like having a centralserver which collects all the records. The important pointin informing peers about a new record is that when a recordlike p → q is generated, it is not clear whether in the futureit will be useful for an arbitrary peer r or not. If we wouldknow, then the problem is already solved. In the ideal casea new record is given it everybody, and this action leads tothe concept of full graph. In our previous work [7], we didexperiments in using such a full graph and compared it withtwo other ways of improving the reputation accuracy (usinga higher number of maxflow hops and computing reputationvalues from the perspective of the node with the highest be-tweenness centrality), and we observed that using the fullgraph is the most influential one.

To build such full knowledge, we create a special graphcalled Gglob, and after reading each record from the trace,this graph is updated with that record. The graph Gglob isused as the reference graph during reputation evaluations,and from a peer p’s point of view, a peer q has two repu-tations, one obtained using Gp and the other using Gglob.The difference between these values shows the accuracy ofSimilDis. In the real environment, the global graph is notkept by any peer, it is just used for our experiments to mea-sure how far the nodes are from the ideal situation. Re-garding the overhead of SimilDis, we compare the commu-nication, the storage, and the computation cost against the

62

Table 1: Simulation parameter settingParameter Value

similarity list length 5lbuf (size of message buffer) 10trec (number of times to send a record) 2fout (fan-out of dissemination) 2nmsg (maximum number of messages per cycle) 2L (single random walk length) 5K (random walk tries) 10α (random walk transport factor) 0.4updatec (threshold for non-processed records) 20updatet (consecutive non-processed cycles) 20θ (decay factor in the DAG-based method) 0.8

cost of building and maintaining such a full graph by eachpeer. The comparison with the hypothetical scenario of Full-Dissemination measures the overheads at their extreme.

6.3 Parameter SettingThe SimilDis protocol has a number of parameters the val-

ues of which influence the protocol performance (see Table1). In order to find an appropriate set of values for these pa-rameters, we use a smaller and independent dataset, crawledin 2009. The filtered trace from this dataset that is fed intosimulator contains 4.8 K edges and 2.7 K nodes. Using thistrace, we have performed a sensitivity analysis, measuringthe reputation error and costs for the combinations of pa-rameters considered.

Since the total parameter space is too large and evaluat-ing all combinations is impossible, we simplify the sensitivityanalysis in two ways. First, we discretize continuous param-eters and analyze only a subset of the feasible values. Forexample, for the transport factor α introduced in Section4.3, we only evaluate the values 0.1, 0.2, . . . , 0.8. Secondly,for each parameter we perform a separate one-dimensionalparameter analysis. In order to do so, we initialize eachparameter to a value that gives the lowest cost (e.g., allgossiping-related parameters are set to 1), which may implya very high error. After initializing the single changing pa-rameter, we fix the others and do simulations with differentvalues for the changing parameter and measure the reputa-tion error. When the change in error between two consecu-tive experiments is less than the threshold value of 0.02, wefix the changing parameter and repeat this process for thenext parameter. Table 1 contains the parameter values thusobtained that we use in the experiments in Section 7.

7. EVALUATIONWe evaluate our protocol from four angles, which are its

accuracy in evaluating reputations, its communication, stor-age, and computation costs, the benefit of the dynamic sim-ilarity update for the DAG-based method, and its resiliencein the face of churn. Each result presented in this section isthe average of 20 simulation runs.

7.1 AccuracyUsing the partial graph Gp and the global graph Gglob

introduced in Section 6, when a record (p, q, U) is read fromthe trace, we compute the subjective and global reputationsgiven to peer q by peer p, respectively, before updating thegraphs Gp and Gglob with this new record. The difference

reputation evaluation error

0

500

1000

1500

2000

2500

DAG Based

−0.5 0.0 0.5 1.0

RW Based

−0.5 0.0 0.5 1.0

(a) Histogram(binwidth=0.065 and the intervals are left-open).

DAG Based RW Based

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

−1.0 −0.5 0.0 0.5 1.0−1.0 −0.5 0.0 0.5 1.0reputation evaluation error

EC

DF

(b) ECDF (for the RW-based method the 10th and 90thpercentiles are at -0.039 and 0.087, and for the DAG-basedmethod at -0.007 and 0.201, respectively)

Figure 7: Histogram and Empirical CDF of the rep-utation evaluation errors in the DAG and RW-basedmethods.

between these two values shows how the restricted dissem-ination in SimilDis affects the reputations of peers. Figure7 shows the histogram and the empirical cumulative distri-bution function (ECDF) of the reputation evaluation errors(subjective reputation minus global reputation) when usingthe DAG and Random Walk (RW) based similarity com-putation methods. This figure contains only the reputationevaluations for which the global reputation is non-zero, inother words, for which there is a meaningful reputation ifwe have full knowledge. In our previous work [7], we haveshown that in terms of accuracy and computation overheadmaxflow with 4 hops gives the best result, with a coverageof around 70% . As can be observed, for both methods theerror values are concentrated around zero, and the standarddeviation of the DAG and RW-based methods are 0.18 and0.14, respectively. In comparison, as the ECDF plot shows,especially at the high ends, the RW-based method performsbetter than the DAG-based method. In general, the errorvalues are biased toward the positive values and this pos-itive bias may give opportunity for free-riders, but on theother hand, honest peers may also benefit from this positivebias, and they may not be rejected to get content from otherpeers.

7.2 CostsTo simulate having Full-Dissemination, we assume that

there is an efficient peer discovery service that informs allpeers when a peer joins the network. To mimic such service

63

simulation cycle

#of s

ent m

essa

ges

100

101

102

103

DAG Based

0 200 400 600 800 1000 1200 1400

RW Based

0 200 400 600 800 1000 1200 1400

DisseminationFull−DisseminationSimilDis

Figure 8: The communication cost in SimilDis vs.Full-Dissemination (the vertical axis is in log-scale).

simulation cycle

avg.

max

flow

com

puta

tion

time

(ms)

0

10

20

30

40

50

0

10

20

30

40

50

800 900 1000 1100 1200 1300 1400D

AG

Based

RW

Based

DisseminationFull−DisseminationSimilDis

Figure 9: The computation cost of maxflow in Sim-ilDis versus Full-Dissemination (the continuous linesshow the averages).

in the simulator, when a record (p, q, U) is read from thetrace, peer p is the peer who is responsible for informingall other peers and sends this record to all existing peers inthe network. When a node joins the network, all peers sendtheir previous upload records to this peer as well. Due to thesmall record size, instead of sending one record in each TCPpacket, peers can send multiple records in a single packet.In Tribler, the length of a BarterCast record is 48 bytes, andeach TCP packet can carry around 30 records.

We evaluate the communication cost of SimilDis by count-ing the number of TCP packets and compare it with the caseof providing peers with all records. Figure 8 shows the totalnumber of messages sent in each simulation cycle. As can beseen, with both the DAG-based and the RW-based method,SimilDis sends two orders of magnitude fewer messages thanin Full-Dissemination. The number of messages with RW isa bit higher than with DAG.

The two major computational costs incurred by SimilDisare the costs for maxflow and similarity update. In the caseof Full-Dissemination, the maxflow computation is the onlycomputational overhead. To compare these costs, we mea-sure the CPU time for each of these algorithms, which areshown in Figures 9 and 10 as a function of the simulationcycle. Since the reputation evaluation happens only in thetesting phase, the horizontal axis of Figure 9 does not startat zero. As can be observed, due to smaller partial graphs

in SimilDis, the maxflow computation time in SimilDis isnearly 10 times shorter than in Full-Dissemination.

The graph in Figure 10 compares the similarity updatetime in the DAG-based versus the RW-based method. TheRW-based method is around 10 times faster. Also, since theupdate algorithm in the DAG-based method depends on howthe partial graphs grow, we observe more fluctuations in theDAG versus the RW-based method.

simulation cycle

avg.

sim

ilarit

y up

date

tim

e (m

s)

101

101.5

102

102.5

103

0 200 400 600 800 1000 1200 1400

Similarity MethodDAG BasedRW Based

Figure 10: The similarity update cost in the DAG-based versus the Random Walk-based method (thevertical axis is in log-scale).

Finally, we consider the storage cost of SimilDis versusFull-Dissemination. We take the sizes of the partial graphsand the Full-Dissemination graph (in terms of the numbersof nodes and edges) as the protocol’s storage cost. Similarto communication cost, in each simulation cycle we measurethe sizes of the partial graphs in SimilDis and compare themwith their size in Full-Dissemination, in which each peer hasa full copy of the whole network. Figure 11 presents theaverage number of nodes and edges in the partial graphsof the peers. In comparison, in SimilDis the graphs areno less than approximately 100 times smaller than in Full-Dissemination.

simulation cycle

avge

rage

num

ber

of n

odes

and

edg

es

101

102

103

104

101

102

103

104

DAG Based

0 200 400 600 800 1000 1200 1400

RW Based

0 200 400 600 800 1000 1200 1400

Edges

Nodes

Dissemination

Full−DisseminationSimilDis

Figure 11: The storage cost in terms of the num-bers of nodes and edges in SimilDis versus Full-Dissemination (the vertical axis is in log-scale).

64

7.3 Efficiency of Dynamic Similarity UpdateTo evaluate the efficiency of the dynamic similarity up-

date algorithm, we rerun the experiments, but this time thesimilarity graph is updated in a static way. Here, by staticwe mean that for each change in the partial graph, the simi-larity graph is created from scratch. Even in this method, inorder to speed-up the similarity computation, we apply sim-ilar heuristics for when to do the similarity re-computationas we use with the RW-based method, see Section 4.3. Fig-ure 12 shows the average dynamic and static update times.As the graph shows, even with using the same heuristicsused for random walking, the static method is much slowerthan the dynamic one.

simulation cycle

avg.

sim

ilarit

y up

date

tim

e (m

s)

0

1000

2000

3000

4000

5000

6000

7000

0 200 400 600 800 1000 1200 1400

Update Type

DynamicStatic

Figure 12: The runtime of the dynamic vs. thestatic similarity update algorithms in the DAG-based method.

7.4 Accuracy Under ChurnTo study the under churn behavior of SimilDis, we per-

form a set of experiments which covers different churn rates.In our experiments peers alternate between the on and offstates. When a peer is on, it receives records and contributesto their dissemination, and in the off state, it is inactive anddoes not receive any records. In our simulation, when a peerbecomes on, it is assigned an on period of a certain dura-tion, and vice versa. However, if during an off period of apeer a record appears in the trace that contains the peer,then immediately it switches to the on state.

To study different churn intensities, we define the onlineratio or as μon/(μon +μoff ), where μon and μoff are the av-erage on and off time, respectively. We do experiments withor = 0.1, 0.2, 0.4, 0.8 and with μon = 1, 3, 5, 10, 20. The val-ues μon and μoff are used as the means of normal distribu-tions from which we generate the peer on and off times; thevariance of these distribution is set to one-third of the mean.For each combination of or and μon, Figure 13 presents theaverage absolute reputation evaluation error (Section 7.1).As can be observed, even with the low online ratio of 0.1the average reputation evaluation error is very low. Thelow error means that due to the targeted disseminated, evenwith a short online time, peers are still able to build partialgraphs that lead to accurate reputation evaluations.

8. CONCLUSION AND FUTUREWORKIn this paper we have introduced two methods for tar-

geted dissemination of information in a distributed reputa-tion mechanism, one based on building a Directed AcyclicGraph (DAG) and other based on Random Walks (RW).The evaluation results show that both methods calculate

avg. on time

avg.

abs

olut

e re

puta

tion

eva

luat

e er

ror

0.0655

0.0660

0.0665

0.0670

0.0675

0.0200

0.0205

0.0210

0.0215

0.0220

or = 0.1

1 3 5 10 20

or = 0.2

1 3 5 10 20

or = 0.4

1 3 5 10 20

or = 0.8

1 3 5 10 20

DA

G based

RW

based

Figure 13: The accuracy of SimilDis under churn(the horizontal axis is μon).

reputations with low errors, with the RW method beingslightly more accurate than the DAG-based method. Interms of communication, computation, and storage costs,both methods are dramatically much more efficient than theFull-Dissemination method, in which the peers receive com-plete information—both the communication and the storagecosts are reduced by a factor of 100. This is very impor-tant for power-constrained mobile devices. In general, themethods proposed in this paper, which aim at targeted in-formation dissemination, are applicable in any application inwhich a graph among the system participants can be built.Moreover, the growth of online social networks has openeda new research area in leveraging social relations to improvesecurity and performance of network applications [28, 38].In such networks, for effective routing and improved secu-rity, the targeted dissemination of this paper can be adaptedand used as well.

In comparison to the DAG-based method, the RW-basedmethod is non-deterministic but it is easier to implement.The DAG-method has the advantage that it assigns a sim-ilarity value to all nodes in the graph, and it can be usedin any similar application in which SimRank or other struc-tural similarity methods are used, such as targeted queryforwarding. Comparing this method with other similaritymethods like SimRank (or a variant of it), in other contextsand graphs, and specially in mobile networks that peers havelimited space, connectivity, and computation power is thefuture work that we are looking at.

9. REFERENCES[1] G. Adomavicius and A. Tuzhilin. Toward the next

generation of recommender systems: A survey of thestate-of-the-art and possible extensions. Knowledgeand Data Engineering, IEEE Transactions on,17(6):734–749, 2005.

[2] A. Allavena, A. Demers, and J. Hopcroft. Correctnessof a gossip based membership protocol. In 24th ACMPODC, 2005.

[3] I. Antonellis, H. Molina, and C. Chang. Simrank++:

65

Query rewriting through link analysis of the clickgraph. VLDB Endowment, 1:408–421, 2008.

[4] J. Breese, D. Heckerman, and C. Kadie. Empiricalanalysis of predictive algorithms for collaborativefiltering. In 14th UAI. Morgan Kaufmann PublishersInc., 1998.

[5] G. Conforti, G. Ghelli, P. Manghi, and C. Sartiani.Scalable query dissemination in xpeer. In IEEEIDEAS, 2007.

[6] T. Cormen, C. Leiserson, R. Rivest, and C. Stein.Introduction to Algorithms, pages 651–664. MIT Pressand McGraw-Hill, second edition, 2001.

[7] R. Delaviz, N. Andrade, and J. Pouwelse. ImprovingAccuracy and Coverage in an Internet-DeployedReputation Mechanism. In IEEE P2P, 2010.

[8] R. Delaviz, N. Andrade, J. Pouwelse, and D. Epema.SybilRes: A sybil-resilient flow-based distributedreputation mechanism. In ICDCS, 2012.

[9] A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson,S. Shenker, H. Sturgis, D. Swinehart, and D. Terry.Epidemic algorithms for replicated databasemaintenance. In 6th ACM PODC, pages 1–12, 1987.

[10] T. Dimitriou, G. Karame, and I. Christou.Supertrust–a secure and efficient framework forhandling trust in super peer networks. DistributedComputing and Networking, pages 350–362, 2008.

[11] P. Eugster, R. Guerraoui, A. Kermarrec, andL. Massoulie. Epidemic information dissemination indistributed systems. Computer, 37(5):60–67, 2004.

[12] F. Gomez Marmol and G. Martınez Perez. Towardspre-standardization of trust and reputation models fordistributed and heterogeneous systems. ComputerStandards & Interfaces, 32(4):185–196, 2010.

[13] I. Gupta, A. Kermarrec, and A. Ganesh. Efficientepidemic-style protocols for reliable and scalablemulticast. In 21st IEEE SRDSS, 2002.

[14] M. Ham and G. Agha. Ara: A robust audit to preventfree-riding in p2p networks. In IEEE P2P, 2005.

[15] K. Hoffman, D. Zage, and C. Nita-Rotaru. A survey ofattack and defense techniques for reputation systems.ACM Computing Surveys (CSUR), 42(1):1–31, 2009.

[16] G. Jeh and J. Widom. Simrank: a measure ofstructural-context similarity. In ACM SIGKDD, 2002.

[17] M. Jelasity and O. Babaoglu. T-man: Gossip-basedoverlay topology management. EngineeringSelf-Organising Systems, pages 1–15, 2006.

[18] M. Jelasity, A. Montresor, and O. Babaoglu.Gossip-based aggregation in large dynamic networks.ACM TOCS, 23:219–252, 2005.

[19] M. Jelasity, A. Montresor, G. Jesi, and S. Voulgaris.Peersim: A peer-to-peer simulator. Internet:http://peersim. sourceforge. net, 2009.

[20] S. Kamvar, M. Schlosser, and H. Garcia-Molina. Theeigentrust algorithm for reputation management inp2p networks. In ACM WWW Conf., 2003.

[21] E. Leicht, P. Holme, and M. Newman. Vertexsimilarity in networks. Physical Review E,73(2):026120, 2006.

[22] C. Li, J. Han, G. He, X. Jin, Y. Sun, Y. Yu, andT. Wu. Fast computation of simrank for static and

dynamic information networks. In 13th ACM EDBT,2010.

[23] M. Li, W. Lee, and A. Sivasubramaniam. Semanticsmall world: An overlay network for peer-to-peersearch. In ICNP 12th, 2004.

[24] T. Locher, P. Moor, S. Schmid, and R. Wattenhofer.Free riding in bittorrent is cheap. In Proc. Workshopon Hot Topics in Networks (HotNets), pages 85–90.Citeseer, 2006.

[25] F. Lorrain and H. White. Structural equivalence ofindividuals in social networks. The Journal ofmathematical sociology, 1(1), 1971.

[26] S. Marti and H. Garcia-Molina. Taxonomy of trust:Categorizing p2p reputation systems. ComputerNetworks, 50(4):472–484, 2006.

[27] M. Meulpolder, J. Pouwelse, D. Epema, and H. Sips.BarterCast: A practical approach to prevent lazyfreeriding in P2P networks. In 6th Workshop onHot-P2P, 2009.

[28] A. Mohaisen, N. Hopper, and Y. Kim. Keep yourfriends close: Incorporating trust into socialnetwork-based sybil defenses. In INFOCOM, 2011,pages 1943–1951. IEEE, 2011.

[29] J. Pouwelse, P. Garbacki, J. Wang, A. Bakker,J. Yang, A. Iosup, D. Epema, M. Reinders,M. Van Steen, and H. Sips. Tribler: A social-basedpeer-to-peer system. Concurrency and ComputationPractice and Experience, 20(2):127–138, 2008.

[30] J. Pouwelse, J. Yang, M. Meulpolder, D. Epema,H. Sips, G. Smit, and M. Lew. Buddycast: anoperational peer-to-peer epidemic protocol stack. In14th ASCI Conference, Eindhoven, Netherlands, 2008.

[31] D. Quercia, S. Hailes, and L. Capra. Lightweightdistributed trust propagation. In Data Mining, 2007.ICDM 2007. Seventh IEEE International Conferenceon, pages 282–291. IEEE, 2007.

[32] F. Ricci, L. Rokach, and B. Shapira. Introduction torecommender systems handbook. RecommenderSystems Handbook, pages 1–35, 2011.

[33] E. Riviere and S. Voulgaris. Gossip-based networkingfor internet-scale distributed systems. E-Technologies:Transformation in Connected World, 2011.

[34] S. Seuken, J. Tang, and D. C. Parkes. AccountingMechanisms for Distributed Work Systems. In Proc.24th AAAI Conference on Artificial Intelligence(AAAI ’10), 2010.

[35] H. Tong, C. Faloutsos, and J. Pan. Random walk withrestart: fast solutions and applications. Knowledgeand Information Systems, 14:327–346, 2008.

[36] H. Tong, S. Papadimitriou, P. Yu, and C. Faloutsos.Proximity tracking on time-evolving bipartite graphs.In SIAM SDM, 2008.

[37] K. Walsh and E. Sirer. Experience with an objectreputation system for peer-to-peer filesharing. InUSENIX NSDI, 2006.

[38] C. Wilson, B. Boe, A. Sala, K. Puttaswamy, andB. Zhao. User interactions in social networks and theirimplications. In 4th ACM-EuroSys, pages 205–218.Acm, 2009.

66