improving search in peer-to-peer networks beverly yang hector garcia-molina presented by shreeram...

Improving Search in Peer-to-Peer Networks

Beverly Yang Hector Garcia-Molina

Presented by

Shreeram Sahasrabudhe

([email protected])

GoalsThree search techniques:

1. Iterative Deepening 2. Directed BFS3. Local Indices

Evaluation and extensive measurements of these techniques on the Gnutella network.Ready-to-use results and recommendations.

Basically - just trying to reduce

nodes that handle a query.

Current Techniques

Gnutella –Breadth First Search (BFS) with depth limit D (typically 7). Disadvantages

Wastage of resources Inefficient

Freenet: Depth First Search (DFS) Disadvantages

Poor Response Time

Iterative Deepening

Required System Wide policy P={a,b,c} Time between successive iterations

W.

S

P = {a,b ,c}

1 a

F r e e z eWait = W

Resend [(TTL a) + query_id]

… (TTL b-a)b

Directed BFS

Send queries to a subset of nodesSubset nodes selected by heuristics like :

Select node … That has highest number of results for

provided queries Whose response messages have taken lowest

avg number of hops. Who has forwarded most messages to our

client Who has the shortest messages queue

Local Indices

Each node n maintains an index of data for nodes within r hopsSo a node can process a query on behalf of every node within r hopssmall r = less storage. (e.g. for r(1)=70KB)

S 1

process

5

process

2 3 4

P= {1,5}

More work

Node Join Sends join message with TTL of r,

containing metadata over its collection A node receiving a join messages sends a

return join message with its metadata Periodic refreshes

Cost ?? QueryJoinRatio = Average ratio of queries

to join messages QueryUpdateRatio = Average ratio of

queries to update messages

ExperimentData Collection Observed Gnutella network traffic for 1 month Determined some general statistics like average

number of files shared /user, query strings etc.

Iterative Deepening For each query Q sent: log response message

arriving in 2min. Ping messages to all neighbors: hops and IP addr. Same data used for Local Indices

Directed BFS Same as above, but each query sent to single

node.

CostBandwidth Cost in BFS:

Processing Cost

Nodes at depth N

Redundant edges between n-1 and n

Size of query message

Total Records

Response messages from nodes n

Size of header

Size of Record

ResultsIterative Deepening Neighbors = 8 Desired number of results Z=50 Policies P={Pd = {d, d+1, … D} for

d=1,2,3..D}

• d = cost

• W = cost

“overshooting”

• W = time

• d = time

COST

Directed BFS

Studied 8 heuristics‘Random neighbor’ is baseline for comparison COST

Local Indices

ConclusionsThree new search systems specified and tested.Recommend: Local Indices with r=1. Savings: 61% bandwidth 49% processing

improving search in peer-to-peer networks beverly yang hector garcia-molina presented by shreeram...

Documents