1 - cs7701 – fall 2004 review of: making gnutella-like p2p systems scalable paper by: – yatin...
Post on 27-Dec-2015
225 Views
Preview:
TRANSCRIPT
1 - CS7701 – Fall 2004
Review of: Making Gnutella-like P2P Systems
Scalable
• Paper by: – Yatin Chawathe (AT&T)–Sylvia Ratnasamy (Intel)–Lee Breslau (AT&T)–Nick Lanham (UC Berkeley)–Scott Shenker (ICSI)
• Published in:– IEEE SIGCOMM 2003
• Reviewed by:– Todd Sproull
•Discussion Leader:– Christoph Jechlitschek
CS7701: Research Seminar on Networkinghttp://arl.wustl.edu/~jst/cse/770/
2 - CS7701 – Fall 2004
Outline
• Introduction• Problem Description• Gia Design• Simulation Results• Implementation • Conclusions
3 - CS7701 – Fall 2004
Introduction
• Peer to Peer (P2P) Networks– “Systems serving other Systems”– Potential for millions of users– Gained consumer popularity through Napster
• Napster– Started in 1999 by Shawn Fanning– Enabled music fans to trade songs over a P2P network– Clients connected to centralized Napster Servers to locate music– 2001 Judge ruled Napster had to block all copyrighted material– 2002 Napster folded
• RIAA continued after Napster clones• Gnutella
– March 14, 2000 Nullsoft released first version of software• Created by Justin Frankel and Tom Pepper• Nullsoft pulled the software the next day
– Software was reverse engineered – Open Source clients became available
– Built around decentralized approach
4 - CS7701 – Fall 2004
Gnutella
• Distributed search and download• Unstructured: ad-hoc topology
– Peers connect to random nodes• Random search
– Flood queries across network• Scaling problems
– As network grows, search overhead increases
P1
P2
P4
P3
who has“madonna”
P 4 has “
madonna-
american-life.mp3”P5
P6
P2 has “madonna-ray-of-light.mp3”
5 - CS7701 – Fall 2004
Problem
• Gnutella has notoriously poor scaling – Flooding-based Solution – Just using Distributed Hash Tables does not
necessarily fix the problem• Challenge
– Improve scaling while maintain Gnutella’s simplicity
• Propose new mechanisms to fix scalability issues• Evaluate performance of these individual
components and the entire network
6 - CS7701 – Fall 2004
What about DHTS?
• Distributed Hash Tables (DHTs)– Provides hash table abstraction over multiple compute nodes
• How it works– Each DHT can store data items– Data items indexed via lookup key– Overlay routing delivers requests for a given key to the responsible
node– O (log N) message hops in network of N nodes– DHT adjusts mapping of keys and neighbor tables when node set
changes
7 - CS7701 – Fall 2004
Example
B’s Routing Table
Key Pointer
7 C
8 D
C
B
D
Key 6?
I have key 6
Key 6?
D’s Routing Table
Key Pointer
6 E
Nope!
Key 6?Key 6?
Key 6!
Key 6!
Key 6!
E
A
8 - CS7701 – Fall 2004
DHT only P2P network?
• Problems– P2P clients are transient
• Clients joining and leaving at rates causing a fair amount of “churn” • Route failures require O (log n) repair operations
– Keyword searches are more prevalent, and more important than an exact-match queries
• “Madonna Ray of Light mp3” or “Madona Ray Light mp3” ..– Queries are for hay, not needles
• Most requests for popular content• 50% content requests for more than 100 replicas• 80% content requests for more than 80 replicas
9 - CS7701 – Fall 2004
The Solution
• Design new Gnutella like P2P system “Gia”– Short for gianduia, generic form of hazelnut spread
Nutella• What’s so great about it?
– Dynamic Topology Adaptation• Accounts for heterogeneity among nodes
– Active Flow Control Scheme• Implements token based allocation for queries
– One-hop replication• Keep small nodes next to well connected “higher capacity”
nodes– Capacity refers to message processing capabilities of a node per
unit time
– Search Protocol based on Random Walks• No longer flooding the network with requests
10 - CS7701 – Fall 2004
• Make high-capacity nodes easily reachable– Dynamic topology adaptation
• Make high-capacity nodes have more answers– One-hop replication
• Search efficiently– Biased random walks
• Prevent overloaded nodes– Active flow control
• Make high-capacity nodes easily reachable– Dynamic topology adaptation
• Make high-capacity nodes have more answers– One-hop replication
• Search efficiently– Biased random walks
• Prevent overloaded nodes– Active flow control
Example
Query
11 - CS7701 – Fall 2004
Dynamic Topology Adaptation
• Core Component of Gia• Goals
– Ensure high capacity nodes are ones with high degree
– Keep low capacity nodes within short reach of high capacity nodes
• Accomplished through satisfaction level S– When S=0, node is dissatisfied– As node accumulates more neighbors,
satisfaction rises until it reaches a satisfaction level of 1
12 - CS7701 – Fall 2004
Adding new neighbors• Adding neighbor Y to X
– Add neighbor new neighbor, if room exists
– If no room, check to see if an existing neighbor can be replaced
– Goal:• Find an existing neighbor
with capacity less then or equal to new neighbor, with the highest degree
• Do not drop an already poorly connected neighbor
• Assumptions:– Max Neighbors of X = 3– Capacity of all nodes the same
X
A
B
Y
C
13 - CS7701 – Fall 2004
Token Based Flow Control
• Allows client to query the neighbor only if allowed from the neighbor– Client must have token from neighbor
• Tokens sent from a client to its neighbors periodically– Token allocation rate based on nodes ability to
process queries
14 - CS7701 – Fall 2004
One Hop Replication
• Gia nodes maintain index of content of neighbors– Improves efficiency of search process– Allows for neighbors to respond to search
queries• Being “close” to content is useful
– Not necessary that you have the requested content, but instead a pointer to it
15 - CS7701 – Fall 2004
Search Protocol
• Based on biased random walks– Gia node selects highest capacity neighbor that
it has tokens for and sends query– Queues message if no tokens available for any
neighbor• Uses two mechanisms for control
– TTL bounds duration of walks– Maintains MAX_RESPONSES parameter for
maximum number of answers it searches for
16 - CS7701 – Fall 2004
Simulations• Four basic models
– FLOOD• Gnutella Model
– RWRT• Random Walks over Random Topologies• Proposed by Lv et al.
– SUPER• Classifies some nodes as “Super Nodes”, based on Capacity (> 1000x)
– GIA• Gia protocol suite
• Capacity– The number of messages (queries or add/drop requests) a node can process per
unit time– Derived from measured bandwidth distributions from Sariou et al.
• Fair amount of clients have dialup connections• Majority are using cable-modem or DSL • Few have “high-speed” connections
17 - CS7701 – Fall 2004
Performance Metrics
• Collapse Point (CP)– Per node query rate at the point beyond which
the success rate drops below 90%. – Referred to as the knee
• Hop-count before collapse (CP-HP)– Average hop count prior to collapse
18 - CS7701 – Fall 2004
Performance Comparison
0.00001
0.001
0.1
10
1000
0.01 0.1 1Replication Rate (percentage)
Co
lla
ps
e P
oin
t (q
ps
/no
de
)
GIA: N=10,000
SUPER: N=10,000
RW RT: N=10,000
FLOOD: N=10,000
19 - CS7701 – Fall 2004
Factor Analysis
• Effects of individual components – Remove each
component from Gia one at a time
– Add each component to RWRT
– No single component contributes entirely to Gia’s success
20 - CS7701 – Fall 2004
Multiple Searches
• CP changes with MAX_RESPONSES
• Replication Factor and MAX_RESPONSES
21 - CS7701 – Fall 2004
Robustness
0.001
0.01
0.1
1
10
100
1000
10 100 1000 10000Per-node max-lifetime (seconds)
Co
llap
se p
oin
t (q
ps
/no
de)
replication rate = 1.0%
replication rate = 0.5%
replication rate = 0.1%
Static SUPER
Static RWRT (1% repl)
22 - CS7701 – Fall 2004
Active Replication
• Allow higher capacity nodes to replicate files– On demand replication when high capacity
node receives query and download request
• Active replication can increase capacity of nodes serving files from a factor of 38 to 50
23 - CS7701 – Fall 2004
Implementation
• Satisfaction Level– Aggressiveness of Adaptation – Exponential relationship between satisfaction
level S and adaptation interval I– Define:
• I = Adaptation interval• S = Satisfaction level• T = maximum interval between adaptation iterations• K = aggressiveness of adaptation interval
– Let I = T * K -(1-S)
24 - CS7701 – Fall 2004
Satisfaction Level
• Calculating Satisfaction level– S = 0 initially and if # of
neighbors is less than predefined min
– Satisfaction Algorithm does the following
• Adds up normalized capacity of all neighbors
– High capacity neighbor with low degree is worth more than High capacity high degree
• Divide your capacity from total to find S
• Returns S=1 if S > 1 or # neighbors greater than predefined max
25 - CS7701 – Fall 2004
Deployment
• Planet Lab– Wide Area service deployment testbed in North
America, Europe, Asia and the South Pacific– Deployed Gia on 83 clients– Measured time to reach “steady state”
26 - CS7701 – Fall 2004
Related Work
• KaZaA– At time of SIGCOMM little had been published on
KaZaA– “Understanding KaZaA” Liang, et al. 2004
• CAP– Cluster based approach to handle scaling in Gnutella
• Based on a central clustering server• Clusters act as directory servers
• PierSearch – Published in SIGCOMM 2004– PIER + Gnutella
• PIER uses DHT for hard to find content and Gnutella for the more popular
• Gnuetella2– Aimed at fixing many of the problems with Gnutella– Not created by Gnutella founders, causing some
controversy in the community
27 - CS7701 – Fall 2004
Conclusion
• Gia proves to be a scalable Gnutella– 3 to 5 orders of magnitude improvement
• Unstructed system works well for popular content– DHT not necessary in most cases
• Working implementation on Planet Lab
28 - CS7701 – Fall 2004
29 - CS7701 – Fall 2004
30 - CS7701 – Fall 2004
31 - CS7701 – Fall 2004
32 - CS7701 – Fall 2004
top related