structured peer-to-peer networks -propeertis & programmnigschmidt/p... · 1 prof. dr. thomas...
TRANSCRIPT
1 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Structured Peer-to-Peer Networks
- Properties & Programming -
Illustration: DHT vers. DNSProgramming a DHT
The Dabek Model Properties + Aspects of DHTs
Properties + Aspects of DHTsLoad BalancingReliabilityDelay StretchChurn
Some graphics taken from: R.Steinmetz, K. Wehrle: Peer-to-Peer Systems and Applications, Springer LNCS 3485, 2005
2 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Illustration: DHT vs. DNS
Comparison DHT vs. DNS
Traditional name services follow fixed mapping
DNS maps a logical node name to an IP address
DHTs offer flat / generic mapping of addresses
Not bound to particular applications or services
„value“ in (key, value) may bean address
a document
or other data …
3 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Comparison: DHT vs. DNS
Domain Name System
Mapping: Symbolic name IP address
Is built on a hierarchical structure with root servers
Names refer to administrative domains
Specialized + optimised to search for computer names and services
Faster, but inflexible
Distributed Hash Table
Mapping: key valuecan easily realize DNS
Does not need a special server
Does not require special name space
Can find data that are independently located of computers
Fully flexible, but slower
There are several Chord-DNS projects
4 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Programming a DHT
Two communication interfaces:
One towards the application layer (user of the DHT)
One towards other nodes within the DHT
Functions similar
Node-to-Node Interface must be network transparent, choice of:Application layer protocol (using TCP or UDP sockets)
Remote procedure calls (RPCs)
Remote Method Invocation (RMI/Corba)
Web services …
Application layer interface may be local or distributed
5 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
DHT Data InterfacesGeneric interface of distributed hash tables
Provisioning of information Publish(key,value)
Requesting of information (search for content)Lookup(key)
Reply: value
DHT approaches are interchangeable (with respect to interface)
Put(Key,Value) Get(Key)
Value
Distributed Application
Node 1 Node NNode 2 . . . .Node 3
Distributed Hash Table (CAN, Chord, Pastry, Tapestry, …)
6 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
DHT Self Organisation Interface
Join(mykey): Retrieve ring successor & predecessor (neighbours), initiate key transfer
Leave(): Transfer predecessor and keys to successor
Maintain predecessor & successor (neighbour) list, e.g., stabilize in Chord
Maintain routing table, e.g., fix_fingers in Chord
7 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Dabek Model
Layered approach towards a “unified overlay routing”
Core idea: KBR layer (Tier 0) as a routing abstraction on (interchangeable) structured schemes
Tier 1:General services
Tier 2:Higher layer services and applications
8 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Common KBR API
9 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Load Balancing in DHTs
Standard assumption: uniform key distributionHash function
Every node with equal load
No load balancing is needed
Equal distributionNodes across address space
Data across nodes
But is this assumption justifiable?Analysis of distribution of datausing simulation
10 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Analysis of distribution of data
ExampleParameters
4,096 nodes
500,000 documents
Optimum~122 documents per node
No optimal distribution in Chord w/o load balancing
Chord without Load Balancing Optimal distribution of documents across nodes
11 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Chord without Load Balancing (cont'd)
Number of nodes without storing any document
Parameters4,096 nodes
100,000 to 1,000,000 documents
Some nodes w/o any load
Why is the load unbalanced? – statistical reasons
We need Load Balancing to keep the complexityof DHT management low
12 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Several techniques to ensure an equal data distribution
Power of Two Choices (Byers et. al, 2003)
Virtual Servers (Rao et. al, 2003)
Thermal-Dissipation-based Approach (Rieche et. al, 2004)
A Simple Address-Space and Item Balancing (Karger et. al, 2004)
Load Balancing Algorithms
13 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Example: Power of Two Choices
Idea
One hash function for all nodesh0
Multiple hash functions for datah1, h2, h3, …hd
Two options
Data is stored at one node
Data is stored at one node &other nodes store a pointer
14 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Power of Two Choices (2)Inserting Data
Results of all hash functions are calculatedh1(x), h2(x), h3(x), …hd(x)
Data is stored on the retrieved node with the lowest load
AlternativeOther nodes stores pointer
The owner of a data has to insert the document periodicallyPrevent removal of data after a timeout (soft state)
15 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Power of Two Choices (3)
Retrieving
Without pointersResults of all hash functions are calculated
Request all of the possible nodes in parallel
One node will answer
With pointersRequest only one of the possible nodes.
Node can forward the request directly to the final node
16 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Résumé: Power of Two Choices
Advantages
Simple
Disadvantages
Message overhead at inserting data
With pointersAdditional administration of pointers
More load
Without pointersMessage overhead at every search
17 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Chord Problems:Unreliable nodes
Inconsistent connections
Lost of data
Successor-ListStored by every node
f nearest successors clockwise on the ring
Reliability in Distributed Hash Tables
Nodes……
……
……
1 2 f
18 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Reliability of Data in Chord
Original: No Reliability of data
Recommendation Use of Successor-List
The reliability of data is an application task
Replicate inserted data to the next f other nodes
Chord inform application of arriving or failing nodes
……
……
19 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
PropertiesAdvantages
After failure of a node its successor has the data already stored
DisadvantagesNode stores f intervals
More data load
After breakdown of a node Find new successor
Replicate data to next node
More message overhead at breakdown
Stabilize-function has to check every Successor-list Find inconsistent links
More message overhead
20 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Multiple Nodes in One IntervalFixed positive number f
Indicates how many nodes have to act within one interval at least
ProcedureFirst node takes a random position
A new node is assigned to any existing node
Node is announced to all other nodes in same interval
……
……
910
123
45
678
Node
21 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Multiple Nodes in One Interval (2)
Effects of algorithm
Reliability of data
Better load balancing
Higher security
……
……
910
123
45
678
Node
22 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Reliability of DataInsertion
Copy of documentsAlways necessary for replication
Less additional expensesNodes have only to store pointers to nodes from the same interval
Nodes store only data of one interval
……
……
23 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Reliability of Data
ReliabilityFailure: no copy of data needed
Data are already stored within same interval
Use stabilization procedure to correct fingersAs in original Chord
……
……
910
123
45
678
Node
24 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Properties
Advantages
Failure: no copy of data needed
Rebuild intervals with neighbors only if critical
Requests can be answered by f different nodes
Disadvantages
Less numbers of intervals as in original Chord
25 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Network Efficiency of DHTs: Delay Stretch
DHT look-up is based on network layer routing in the underlay
A measure of network layer efficiency of a DHT is the relative increase in underlay hops: Delay Stretch
Def: for all overlay hops x, y on the way from u to v (distances to be taken in the underlay)
The number of (underlay) hops on the from u to v in the overlay: d · distunderlay(u,v)
26 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Delay Stretch (2)
Two types of DHTs:
Topology unaware (Chord, CAN):
Delay stretch d = number of overlay hops
Average number increases with overlay nodes
Topology aware (Pastry, CAN with Landmarking)
Delay stretch (significantly) reduced
Achievable: Delay stretch constant wrt. overlay nodes (with constant value 1.5, s. Dabek et al.)
Enhanced importance in mobile ad hoc networks
27 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Churn
DHTs are designed to handle node volatility
Rapid, continuous arrival and departure of nodes is denoted as Churn
Possible metric: nodes’ session times in the DHT
Problem: DHT fault resilience & recovery may occur at a timescale larger than nodes’ sessions
May cause long latencies, message loss & inconsistencies
Empirical studies indicate presence of Churn phenomena
Increased importance in ‘wireless last mile’ and Manets
28 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Robustness under Churn
Resilience to node failures is essentially done by
Reactive recovery
Copy neighbour sets on loss detection (Pastry, Can)
Positive feedback cycle under congestion and churn
Periodic recovery
Fix neighbour sets independent of loss (Chord)
Higher load without churn, persistence of defective entries
Mitigation: reactive recovery after multiple timeouts
29 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Reactive versus Periodic Recovery
30 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
Proximity Neighbour Selection under Churn
Proximity Neighbour Selection limits delay stretch, but takes extra effort to measure proximity
Under Churn, proximity detection may become worthless, as nodes may leave shortly after detected
Improvements to global sampling:
Sample on neighbour’s neighbours
Sample on neighbour’s inverse neighbours - those, who have (almost) the same neighbours
31 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt
References
• S. Rieche, H. Niedermayer, S. Götze, K. Wehrle: Reliability and Load Balancing in DHTs, in R.Steinmetz, K. Wehrle: Peer-to-Peer Systems and Applications, Springer LNCS 3485, 2005.
• F. Dabek, J. Li, E. Sit, J. Robertson, M. F. Kaashoek, R. Morris: Designing a DHT for low latency and high throughput. In Proc. NSDI, 2004.
• S. Rhea, D. Geels, T. Roscoe, J. Kubiatowskicz: Handling Churn in a DHT, Proc. of 2004 USENIX Annual Techn. Conference, Boston, 2004.