improving search in peer-to-peer networks beverly yang hector garcia-molina presented by shreeram...
TRANSCRIPT
Improving Search in Peer-to-Peer Networks
Beverly Yang Hector Garcia-Molina
Presented by
Shreeram Sahasrabudhe
GoalsThree search techniques:
1. Iterative Deepening 2. Directed BFS3. Local Indices
Evaluation and extensive measurements of these techniques on the Gnutella network.Ready-to-use results and recommendations.
Basically - just trying to reduce
nodes that handle a query.
Current Techniques
Gnutella –Breadth First Search (BFS) with depth limit D (typically 7). Disadvantages
Wastage of resources Inefficient
Freenet: Depth First Search (DFS) Disadvantages
Poor Response Time
Iterative Deepening
Required System Wide policy P={a,b,c} Time between successive iterations
W.
S
P = {a,b ,c}
1 a
F r e e z eWait = W
Resend [(TTL a) + query_id]
… (TTL b-a)b
Directed BFS
Send queries to a subset of nodesSubset nodes selected by heuristics like :
Select node … That has highest number of results for
provided queries Whose response messages have taken lowest
avg number of hops. Who has forwarded most messages to our
client Who has the shortest messages queue
Local Indices
Each node n maintains an index of data for nodes within r hopsSo a node can process a query on behalf of every node within r hopssmall r = less storage. (e.g. for r(1)=70KB)
S 1
process
5
process
2 3 4
P= {1,5}
More work
Node Join Sends join message with TTL of r,
containing metadata over its collection A node receiving a join messages sends a
return join message with its metadata Periodic refreshes
Cost ?? QueryJoinRatio = Average ratio of queries
to join messages QueryUpdateRatio = Average ratio of
queries to update messages
ExperimentData Collection Observed Gnutella network traffic for 1 month Determined some general statistics like average
number of files shared /user, query strings etc.
Iterative Deepening For each query Q sent: log response message
arriving in 2min. Ping messages to all neighbors: hops and IP addr. Same data used for Local Indices
Directed BFS Same as above, but each query sent to single
node.
CostBandwidth Cost in BFS:
Processing Cost
Nodes at depth N
Redundant edges between n-1 and n
Size of query message
Total Records
Response messages from nodes n
Size of header
Size of Record
ResultsIterative Deepening Neighbors = 8 Desired number of results Z=50 Policies P={Pd = {d, d+1, … D} for
d=1,2,3..D}
• d = cost
• W = cost
“overshooting”
• W = time
• d = time
COST
Directed BFS
Studied 8 heuristics‘Random neighbor’ is baseline for comparison COST
Local Indices
ConclusionsThree new search systems specified and tested.Recommend: Local Indices with r=1. Savings: 61% bandwidth 49% processing