cs514: intermediate course in computer systems

17
CS514: Intermediate Course in Computer Systems Lecture 25: Nov 17, 2003 Peer-to-peer protocols for file and data replication: file sharing” CS514 P2P file sharing File sharing dominates traffic usage for University of Washington Recent study, presented recently at Cornell by Hank Levy Kazaa This is a recent sea change, so P2P phenomenon worth looking at

Upload: rajashekarpula

Post on 30-May-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

8/14/2019 CS514: Intermediate Course in Computer Systems

http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 1/17

CS514: IntermediateCourse in Computer SystemsLecture 25: Nov 17, 2003“Peer-to-peer protocols for file and data replication: file sharing”

CS514

P2P file sharing

File sharing dominates traffic usagefor University of Washington

Recent study, presented recently atCornell by Hank LevyKazaa

This is a recent sea change, so P2Pphenomenon worth looking at

8/14/2019 CS514: Intermediate Course in Computer Systems

http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 2/17

CS514

File sharing is nothing new

Has been going on over IRC (Internet RelayChat) for years

Chat groups focusing on certain artists or genres

Upload servers were popular for a whileUsers would upload (FTP) a songThis allowed them to download N songsPerformance typically sucked

The record industry shut these downIn all the above it took user effort to findwhat was wanted

CS514

Napster changed everything

Central search engine, but peer-to-peer file transfer 160+ search engines at peek

User attaches to one of themEngine would index collections of its own active usersSearch on a given engine returned results from thatengineUnless not enough results, then would ask other engines

• This is what I understand from Saroiu et.al. U-Washmeasurement paper

Peer-to-peer file transfer improved scalability

8/14/2019 CS514: Intermediate Course in Computer Systems

http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 3/17

CS514

Napster problem

As we all know, the problem withNapster is that it is a single point of litigationGnutella designed as a lawyer-resilient Napster

Nothing more, nothing less (in myopinion)

CS514

In the meanwhile…

Ian Clarke (Edinburgh, 1999*) wasthinking about P2P from an anonymityperspective

He was interested in free speechWhistleblowers, political dissent, etc.

Ian designed Freenet as a classprojectFreenet is not so much file sharing asit is a publishing medium

* Before Gnutella

8/14/2019 CS514: Intermediate Course in Computer Systems

http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 4/17

CS514

Freenet, Gnutella, andDHTs: One out of three…

Gnutella

Freenet

DHT

AnonymityKeywordsearch

Scalability

CS514

Get-em-out-quick projects

Both Gnutella and Freenet were quick-and-dirty prototypes

Neither really worked through all the issues

As a result, both are deeply flawedNevertheless, both captured the popular imagination, and so are worth talking about

And both have spawned new workFor instance, FastTrack (Kazaa, Grokster …)spawned from Gnutella

8/14/2019 CS514: Intermediate Course in Computer Systems

http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 5/17

8/14/2019 CS514: Intermediate Course in Computer Systems

http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 6/17

CS514

Bootstrap

DHT:Have to know at least one active member

Gnutella:Have to know at least one active member

Freenet:Have to know at least one active member

Scaling issue for everybodyBut fundamental operational issue for Gnutella (which cannot have any centralpoint of control)

CS514

Bootstrap

Various possible ways to know a current member Email from friends, web site with list of addresses,rendezvous server with active knowledge of P2PnetworkPastry folks suggest using a universal DHT tobootstrap other DHTs

Gnutella uses rendezvous server approach, called“pong server” or “host cache”!!!

Host cache lists are distributed with software• They may change, so also listed on various websites,

forums, etc.Single point of litigation!Scaling issue also

8/14/2019 CS514: Intermediate Course in Computer Systems

http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 7/17

CS514

Neighbor discovery andselection

DHT:Search network, download neighbor tables,adjust as needed

Gnutella:“Pong” message truncated flooded throughnetwork“Ping” messages returned from each nodevia reverse path of pong

Freenet:Not sure about initial discovery and selection

CS514

Network (neighbor)maintenance

DHT:Nodes ping each other, do repair whenneighbor lost (or in background)

Gnutella:

Nodes only need to have enough activeneighbors … no structure to maintainFreenet:

Loose structure…learns of neighbors near itself in the node ID space over time throughprocess of insertion and search

8/14/2019 CS514: Intermediate Course in Computer Systems

http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 8/17

CS514

File insertion and storage

DHT:File (or file pointer) stored at hashed nodemay be replicated or cached

Gnutella:File stored at owner node

Freenet:File stored in “vicinity” of hashed node

may be cached

CS514

File search and retrieval

DHT:Search for hashed node or replicaOpportunistically may find cache

Gnutella:Truncated flood of search queryFreenet:

“Soft” search based on “steepest-ascent hill-climbing”

8/14/2019 CS514: Intermediate Course in Computer Systems

http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 9/17

CS514

File deletion

DHT:Send delete message to hashed node

Gnutella:Delete from local directory

Freenet:LRU replacement policyNo explicit delete

(No update either, because don’t know howto flush old copies)

CS514

Freenet structure

Every node gets a GUIDGlobally Unique IDSame hash space as filesAssigned by some cryptographic distributedrandom number generation protocol runamong nodes discovered with a truncatedrandom walk

• This is supposed to prevent a newcomer frommaking up its own GUID, though seems easy tobreak to me…

8/14/2019 CS514: Intermediate Course in Computer Systems

http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 10/17

8/14/2019 CS514: Intermediate Course in Computer Systems

http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 11/17

CS514

How Freenet supportsanonymity

All messages are encrypted (hop-by-hop)Source of search (find and insert) is hidden

Each node remembers previous hop in chainback to source

Originator of file does not store file innetworkNodes in search path occasionally(randomly) claim to be the holder of a file

Search can start with initial random walk tohide “location” of searcher (partly deduciblethrough TTL value)

CS514

Freenet scalability: Fitscurve of N 0.28

That’s pretty good! But . . .

(simulation,no deletes)

8/14/2019 CS514: Intermediate Course in Computer Systems

http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 12/17

CS514

Distribution of node degree

Artificial max nodedegree of 250

CS514

Distribution of node degree

Reinforcing nature of learning tends toconcentrate knowledge on a smallpercentage of nodesFreenet inventors consider this a feature,

but . . .“Small world network” (power-law distributionof node degree)

Terrible load balance!!!In essence like centralizing the search!!!

8/14/2019 CS514: Intermediate Course in Computer Systems

http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 13/17

CS514

Freenet’s anonymity is weakat best

Attacker can attach itself all over thenetwork

Pretend to have lots of different keys so attach to lotsof nodesNow can see a lot of the traffic

Attacker can push files out of the system byauthoring lots of files of similar ID, and bysuppressing requests for the file

Cache file itself and answer queries from cache

Other nodes won’t see queries, so won’t refreshcache

CS514

Freenet’s anonymity is weakat best

Originator has to repeatedly refresh the fileLook for these refreshes, and deducelocation of originator

Of course, can similarly find out who is

searching for certain filesIn fact, repressive government’s beststrategy might be to encourage Freenet, inorder to encourage usage by subversivesand find them!

Weak anonymity worse than none at all!

8/14/2019 CS514: Intermediate Course in Computer Systems

http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 14/17

CS514

Fundamental problem withGnutella

Scalability of broadcast searchLow bandwidth nodes essentially unusablebecause of search load

Solution is to elect “super-nodes”(FastTrack does this) among stable, well-connected participantsClients upload index to super-nodesTwo positive effects:

Search sent to nodes better equipped tohandle themSearches sent to fewer nodes total

CS514

Gnutella load balancing isvery bad

Suffers from same “small world network”phenomenon as Freenet

Saroiu et.al. measurement studyBroadcast discovery (ping) tends to

discover already well-connected nodesConnecting to them makes them more well-connected

Well-connected nodes will getdisproportionate share of searchesLike Freenet, this makes it easy for anattacker to be everywhere in the network

8/14/2019 CS514: Intermediate Course in Computer Systems

http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 15/17

CS514

Gnutella file transfer performed poorly

>50% of downloads failed (in 2001)Bad client softwareThinly connected clients

Improvements:Persistent automatic attempts by client (actslike pub/sub in a way)Chunked data

• Allows interrupted downloads to continue later • Allows “striped” parallel download

CS514

P2P measurement study

Saroiu, Gummadi, Gribble, UWashCrawled Gnutella and Napster Measured:

Bottleneck bandwidthsIP-level latenciesConnect/disconnect patternsDegree of sharing and downloadDegree of cooperationCorrelations of above

8/14/2019 CS514: Intermediate Course in Computer Systems

http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 16/17

CS514

P2P measurement studymajor results

Extreme heterogeneity7% of users offer more than 50% of files20% longest sessions an order of magnitude longer than 20% shortestsessionsDesign for extreme heterogeneity

Gnutella overlays often disjoint

CS514

P2P measurement studymajor results

Latency: Closest 20% 4 times closer thanfurthest 20%

Have techniques to find low-latency neighbors

Peers deliberately misrepresent themselves30% advertise lower than actual bandwidth,presumably to discourage upload25% don’t advertise bandwidthDon’t trust clients to be honest, measuretheir capabilities in the system

8/14/2019 CS514: Intermediate Course in Computer Systems

http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 17/17

CS514

Session duration

CS514

Status of distributed filesharing

FastTrack/Kazaa the big player todayRIAA tried to sue them, claiming:

They control the networkThey are aware of and encourage piratingThey profit from it

RIAA lost (now they are suing users!)But I suspect that RIAA is right in that Kazaacould easily control the network