1 - cs7701 – fall 2004 review of: making gnutella-like p2p systems scalable paper by: – yatin...

1 - CS7701 – Fall 2004

Review of: Making Gnutella-like P2P Systems

Scalable

• Paper by: – Yatin Chawathe (AT&T)–Sylvia Ratnasamy (Intel)–Lee Breslau (AT&T)–Nick Lanham (UC Berkeley)–Scott Shenker (ICSI)

• Published in:– IEEE SIGCOMM 2003

• Reviewed by:– Todd Sproull

•Discussion Leader:– Christoph Jechlitschek

CS7701: Research Seminar on Networkinghttp://arl.wustl.edu/~jst/cse/770/

2 - CS7701 – Fall 2004

Outline

• Introduction• Problem Description• Gia Design• Simulation Results• Implementation • Conclusions

3 - CS7701 – Fall 2004

Introduction

• Peer to Peer (P2P) Networks– “Systems serving other Systems”– Potential for millions of users– Gained consumer popularity through Napster

• Napster– Started in 1999 by Shawn Fanning– Enabled music fans to trade songs over a P2P network– Clients connected to centralized Napster Servers to locate music– 2001 Judge ruled Napster had to block all copyrighted material– 2002 Napster folded

• RIAA continued after Napster clones• Gnutella

– March 14, 2000 Nullsoft released first version of software• Created by Justin Frankel and Tom Pepper• Nullsoft pulled the software the next day

– Software was reverse engineered – Open Source clients became available

– Built around decentralized approach

4 - CS7701 – Fall 2004

Gnutella

• Distributed search and download• Unstructured: ad-hoc topology

– Peers connect to random nodes• Random search

– Flood queries across network• Scaling problems

– As network grows, search overhead increases

who has“madonna”

P 4 has “

madonna-

american-life.mp3”P5

P2 has “madonna-ray-of-light.mp3”

5 - CS7701 – Fall 2004

Problem

• Gnutella has notoriously poor scaling – Flooding-based Solution – Just using Distributed Hash Tables does not

necessarily fix the problem• Challenge

– Improve scaling while maintain Gnutella’s simplicity

• Propose new mechanisms to fix scalability issues• Evaluate performance of these individual

components and the entire network

6 - CS7701 – Fall 2004

What about DHTS?

• Distributed Hash Tables (DHTs)– Provides hash table abstraction over multiple compute nodes

• How it works– Each DHT can store data items– Data items indexed via lookup key– Overlay routing delivers requests for a given key to the responsible

node– O (log N) message hops in network of N nodes– DHT adjusts mapping of keys and neighbor tables when node set

changes

7 - CS7701 – Fall 2004

Example

B’s Routing Table

Key Pointer

Key 6?

I have key 6

Key 6?

D’s Routing Table

Key Pointer

Key 6?Key 6?

Key 6!

8 - CS7701 – Fall 2004

DHT only P2P network?

• Problems– P2P clients are transient

• Clients joining and leaving at rates causing a fair amount of “churn” • Route failures require O (log n) repair operations

– Keyword searches are more prevalent, and more important than an exact-match queries

• “Madonna Ray of Light mp3” or “Madona Ray Light mp3” ..– Queries are for hay, not needles

• Most requests for popular content• 50% content requests for more than 100 replicas• 80% content requests for more than 80 replicas

9 - CS7701 – Fall 2004

The Solution

• Design new Gnutella like P2P system “Gia”– Short for gianduia, generic form of hazelnut spread

Nutella• What’s so great about it?

– Dynamic Topology Adaptation• Accounts for heterogeneity among nodes

– Active Flow Control Scheme• Implements token based allocation for queries

– One-hop replication• Keep small nodes next to well connected “higher capacity”

nodes– Capacity refers to message processing capabilities of a node per

unit time

– Search Protocol based on Random Walks• No longer flooding the network with requests

10 - CS7701 – Fall 2004

• Make high-capacity nodes easily reachable– Dynamic topology adaptation

• Make high-capacity nodes have more answers– One-hop replication

• Search efficiently– Biased random walks

• Prevent overloaded nodes– Active flow control

• Make high-capacity nodes easily reachable– Dynamic topology adaptation

• Make high-capacity nodes have more answers– One-hop replication

• Search efficiently– Biased random walks

• Prevent overloaded nodes– Active flow control

Example

11 - CS7701 – Fall 2004

Dynamic Topology Adaptation

• Core Component of Gia• Goals

– Ensure high capacity nodes are ones with high degree

– Keep low capacity nodes within short reach of high capacity nodes

• Accomplished through satisfaction level S– When S=0, node is dissatisfied– As node accumulates more neighbors,

satisfaction rises until it reaches a satisfaction level of 1

12 - CS7701 – Fall 2004

Adding new neighbors• Adding neighbor Y to X

– Add neighbor new neighbor, if room exists

– If no room, check to see if an existing neighbor can be replaced

– Goal:• Find an existing neighbor

with capacity less then or equal to new neighbor, with the highest degree

• Do not drop an already poorly connected neighbor

• Assumptions:– Max Neighbors of X = 3– Capacity of all nodes the same

13 - CS7701 – Fall 2004

Token Based Flow Control

• Allows client to query the neighbor only if allowed from the neighbor– Client must have token from neighbor

• Tokens sent from a client to its neighbors periodically– Token allocation rate based on nodes ability to

process queries

14 - CS7701 – Fall 2004

One Hop Replication

• Gia nodes maintain index of content of neighbors– Improves efficiency of search process– Allows for neighbors to respond to search

queries• Being “close” to content is useful

– Not necessary that you have the requested content, but instead a pointer to it

15 - CS7701 – Fall 2004

Search Protocol

• Based on biased random walks– Gia node selects highest capacity neighbor that

it has tokens for and sends query– Queues message if no tokens available for any

neighbor• Uses two mechanisms for control

– TTL bounds duration of walks– Maintains MAX_RESPONSES parameter for

maximum number of answers it searches for

16 - CS7701 – Fall 2004

Simulations• Four basic models

– FLOOD• Gnutella Model

– RWRT• Random Walks over Random Topologies• Proposed by Lv et al.

– SUPER• Classifies some nodes as “Super Nodes”, based on Capacity (> 1000x)

– GIA• Gia protocol suite

• Capacity– The number of messages (queries or add/drop requests) a node can process per

unit time– Derived from measured bandwidth distributions from Sariou et al.

• Fair amount of clients have dialup connections• Majority are using cable-modem or DSL • Few have “high-speed” connections

17 - CS7701 – Fall 2004

Performance Metrics

• Collapse Point (CP)– Per node query rate at the point beyond which

the success rate drops below 90%. – Referred to as the knee

• Hop-count before collapse (CP-HP)– Average hop count prior to collapse

18 - CS7701 – Fall 2004

Performance Comparison

0.00001

0.01 0.1 1Replication Rate (percentage)

GIA: N=10,000

SUPER: N=10,000

RW RT: N=10,000

FLOOD: N=10,000

19 - CS7701 – Fall 2004

Factor Analysis

• Effects of individual components – Remove each

component from Gia one at a time

– Add each component to RWRT

– No single component contributes entirely to Gia’s success

20 - CS7701 – Fall 2004

Multiple Searches

• CP changes with MAX_RESPONSES

• Replication Factor and MAX_RESPONSES

21 - CS7701 – Fall 2004

Robustness

10 100 1000 10000Per-node max-lifetime (seconds)

replication rate = 1.0%

Static SUPER

Static RWRT (1% repl)

22 - CS7701 – Fall 2004

Active Replication

• Allow higher capacity nodes to replicate files– On demand replication when high capacity

node receives query and download request

• Active replication can increase capacity of nodes serving files from a factor of 38 to 50

23 - CS7701 – Fall 2004

Implementation

• Satisfaction Level– Aggressiveness of Adaptation – Exponential relationship between satisfaction

level S and adaptation interval I– Define:

• I = Adaptation interval• S = Satisfaction level• T = maximum interval between adaptation iterations• K = aggressiveness of adaptation interval

– Let I = T * K -(1-S)

24 - CS7701 – Fall 2004

Satisfaction Level

• Calculating Satisfaction level– S = 0 initially and if # of

neighbors is less than predefined min

– Satisfaction Algorithm does the following

• Adds up normalized capacity of all neighbors

– High capacity neighbor with low degree is worth more than High capacity high degree

• Divide your capacity from total to find S

• Returns S=1 if S > 1 or # neighbors greater than predefined max

25 - CS7701 – Fall 2004

Deployment

• Planet Lab– Wide Area service deployment testbed in North

America, Europe, Asia and the South Pacific– Deployed Gia on 83 clients– Measured time to reach “steady state”

26 - CS7701 – Fall 2004

Related Work

• KaZaA– At time of SIGCOMM little had been published on

KaZaA– “Understanding KaZaA” Liang, et al. 2004

• CAP– Cluster based approach to handle scaling in Gnutella

• Based on a central clustering server• Clusters act as directory servers

• PierSearch – Published in SIGCOMM 2004– PIER + Gnutella

• PIER uses DHT for hard to find content and Gnutella for the more popular

• Gnuetella2– Aimed at fixing many of the problems with Gnutella– Not created by Gnutella founders, causing some

controversy in the community

27 - CS7701 – Fall 2004

Conclusion

• Gia proves to be a scalable Gnutella– 3 to 5 orders of magnitude improvement

• Unstructed system works well for popular content– DHT not necessary in most cases

• Working implementation on Planet Lab

28 - CS7701 – Fall 2004

29 - CS7701 – Fall 2004

30 - CS7701 – Fall 2004

31 - CS7701 – Fall 2004

32 - CS7701 – Fall 2004

1 - cs7701 – fall 2004 review of: making gnutella-like p2p systems scalable paper by: – yatin...

napster napster

p2p network clients

entire network slide

jstcse770 slide

network of n nodes dht

christoph jechlitschek

problem gnutella

napster clones gnutella

Documents

cs 168 introduction to the internet: architecture and...

a case for performance- centric network allocation gautam...

intro letter infinity yatin

agile tour pune 2015: agility with microservices and devops:...

network architecture ee 122, fall 2013 sylvia ratnasamy...

routing in the internet cs168, fall 2014 sylvia ratnasamy...

layering and the network layer cs168, fall 2014 sylvia...

yatin lic rough

systems issues for scalable, fault tolerant internet...

place lab: bootstrapping location-enhanced computing anthony...

sharing cloud networks lucian popa, gautam kumar, mosharaf...

university of california berkeley armando fox , steven d....

midterm review ee 122, fall 2013 sylvia ratnasamy ee122/...

gia: making gnutella-like p2p systems scalable yatin...

interdomain routing ee 122, fall 2013 sylvia ratnasamy...

yatin chawathe yatin@research.att.com research menlo park ...

~leaningful change detection in structured data* sudarshan...

yatin industronics, nashik, heating equipment

lic project yatin incomplete

sudarshan s. chawathe › epscor › wp-content › uploads...