1 peer-to-peer systems. 2 introduction what is peer one that of equal standing with another...
TRANSCRIPT
1
Peer-to-Peer Systems
2
Introduction
What is peerOne that of equal standing with another
Peer-to-peerA way of structure distributed applicationsEach node acts as both a client and a server
3
Client-server v.s. Peer-to-Peer networkExample : How to find an object in the network• Client-server approach
– Use a big server store objects and provide a directory for look up
• Peer-to-Peer approach– Data are fully distributed– Each peer acts as both a client and a server– By asking?
Client-server• Client is dump• Server does most things, but…
Peer-to-peer• The peers have equal functionality
– Client, server, router
4
Characteristic of P-to-P
Multiple peers participating the networkThe number of peers are largeEach peer contains some sharing resources
Distributed, decentralizedSelf-controlAd hoc participationDynamic
Resource sharing, cost sharing
5
Applications
File sharingNapsper, Gnutella
Instant messageICQ
GamingInformation hidingEtc…
6
General assumption in P-to-P
7
TopologyDistributing objects, centralizing directory
Napster• Most famous and motivate whole P2P research
Distributing objects without centralizing directory
Gnutella• No centralized directory servers• Pings the net to locate friends• File requests are broadcast to friends• When provider located, file transferred via HTTP
Freenet
8
Distributing objects and directoriesChordCanHypercube• PRR• Pastry• Tapestry• Etc…
YapperDistributing objects and multiple servers
Supper peers network
9
Desirable prpertiesDeterministic location
If an object exists anywhere in the network, it should be located
Routing localityRoute should have low stretch
Load balanceThe load of storing objects (or object locations) and routing information should be evenly distributed over network nodes
Dynamic membershipThe network should adapt to joining and leaving nodes while maintaining the above properties
10
11
12
Gnutella : summary
Fully distributedSimple, efficient, flexible query
High network trafficThe cost of a search is unboundedThe life time of a message is unknown
Only know its hop count but not duration
13
Freenet
Selective routingQueries for files follow a route biased by hints
Replication of data clusteringKey clusteringImprove data availability
14
Chord
A distributed lookup protocolRouting table is distributedGiven a key, it maps the key onto a node
15
Base Chord protocolConsistent hashing• The consistent hash function assigns each
node and key an m-bit identifier using a base hash function
– A nodes’s identifier is chosen by hashing the node’s IP address
– A key identifier is produced by hashing the key– M must be large enough to make the probability of
two nodes hashing to the same identifier negligible– Identifiers are ordered in an identifier circle modulo
2m
– Key k is assigned to the first node whose identifier is equal to or follows k in the identifier space.
– Successor(k)
16
17
What should be done when a node n join or leave the system?
Scalable key locationRouting information• Each node only be aware of its successor node on the c
ircle– Inefficient
• Each node, n, maintains a routing table with at most m entries
– Finger table– A node’s finger table generally does not contain enough info
rmation to determine the successor of an arbitrary key k.– The finger pointers at repeatedly doubling distances around t
he circle each forwarding process halve the distance to the target identifier
– O(logN)
18
19
Node joinsEach node maintains a predecessor pointerWhen node n joins• Initialize the predecessor and fingers of node n.• Update the fingers and predecessors of existing
nodes to reflect the addition of n– Node n will become the ith finger of node p iff
– P precedes n by at least 2i-1 and– The ith finger of node p succeeds n.
• Transferring and publishing keys
20
21
Failures• When a node fails, nodes whose finger
tables include n must find n’s successor.• Maintains a “successor-list” of r nearest
successors• Replications
22
Can (Content-Addressable Network)
The entire CAN space is divided amongst the nodes currently in the system.
The new node must find a node already in the CANUsing the CAN routing mechanisms, it must find a node whose zone will be splitThe neighbors of the split zone must be notified so that routing can include the new node
23
24
Node departure, recovery and Can maintenance
One of the failed node’s neighbors takes over the zone
• (key,value) pairs held by the departing node are lost until the state is refreshed by the holders of the data
Takeover algorithm• Each neighbor of the failed node will start a takeover t
imer running independently.• When timer expires, the peer send a TAKEOVER messa
ge conveying its own zone volume to all of the failed node’s neighbors
• Compare the volume– The node which is still alive and has a small zone volum
e will be chosen.
25
Design improvementsMulti-dimensioned coordinate spaces
• Increasing the dimensions reduces the routing path length
RTT (round-trip-time) weighted routing• Reducing the latency of individual hops along the
path and not at reducing the path lengthOverloading coordinate zones
• Allow multiple nodes to share the same zone– A node maintains a list of its peers in addition to its
neighbor list• Adv
– Reduced per-hop latency– Improved fault tolerance
Multiple hash functions• Improve data availability• Map a single key onto k points (replication)
26
Hypercube routing
Node and object Ids are drawn from the same ID space which can be thought of as a ring
Each node’s ID is represented by d digits of base b• Example : 32-bit ID => 8 Hex digits
(b=16)
27
Neighbor tableEach node consists of d levels with b entries at each levelEach node also keeps track of its reverse-neighbors
28
Routing schemeExample
Join protocol (index maintaining)Single joinMultiple joins• Sequential joins• Concurrent joins
– Independent joins– Dependent joins
29
YAPPERS: A Peer-to-Peer Lookup Service over Arbitrary Topology
Prasanna GanesanQixiang SunHector Garcia-MolinaIEEE INFOCOM 2003
30
Intro.
Build a small DHT consisting of nearby nodes and then provide an intelligent search mechanism that can traverse all the small DHTsYAPPERS (Yet Another Peer-to-PEeR System) operates on top of an arbitrary overlay network.Lookup service
Partial lookupTotal lookup
31
Key concept
If a node A wants to register a value for a white key <Kw, V1>, this pair can be stored at A itself, since A is also white.For a gray key and its value <Kg, V2> , then A looks for a neighboring gray node.A query for a gray key needs to be forwarded only to gray nodes.
32
33
We call the nodes within h hops the immediate neighborhood of a nodeThe nodes within 2h+1 hops the extended neighborhood
34
Basic algorithm
Consistency: if a node X is in two different neighborhoods IN(A) and IN(B), both A and B assign the same color to node XStability: X is assigned the same color regardless how IN(A) changes dynamically when nodes enter or leaveStability reduces data relocationThe key assignment:
35
The immediate neighborhood
Multiple nodes in IN(X) have the same color?
Allowing X to pick any one of these nodes to store the key
No nodes in IN(X) have color C?By a backup assignment scheme
36
Backup assignment
When there are no nodes in IN(X) that have color Ci, color Ci is assigned to a node with color C {(i+1) mod b}, if there are multiple nodes of C {(i+1) mod b}, choose the node with the smallest IP.
37
In resolving the pitfalls mentioned above, our solution is no longer consistent and stable as envisioned earlier.By probabilistic analysis, it can shown that if a node A has blogb nodes in IN(A), then with high probability there exists a node of each color.
38
Maintaining topology
Edge deletion: when deleting an edge (X,Y), both X and Y broadcast the deletion event to its surviving neighbors with a TTL of 2h.Edge insertion: when adding an edge (X,Y), a “trim” technique will be performed by nodes connected to X and Y.
39
Enhancement
Fringe node problem solutions;Pruning: If X is a fringe node, then X doesn’t participate in YAPPERS directly. It selects a nearby high connectivity node Y as its proxyBiased backup: forbidding a node with a small immediate neighborhood assigning backup colors to a node with a large immediate neighborhood
40
Requirements
ExpressivenessWork in P2P search has focused on answering simple queriesTypes of queries• Key lookup• Keyword• Range query• Aggregates• SQL
41
Autonomy, Efficiency and Robustness
Autonomy• The freedom of node join and leave
Efficiency• Bandwidth, processing power
Robustness• Stability in the presence of failures
42
ComprehensivenessQuality of Service
Number of resultsResponse timerelevance