towards efficient load balancing in structured p2p systems
DESCRIPTION
Towards Efficient Load Balancing in Structured P2P Systems. Yingwu Zhu, Yiming Hu University of Cincinnati. Outline. Motivation and Preliminaries Load balancing scheme Evaluation. Why Load Balancing?. Structured P2P systems, e.g., Chord,Pastry - PowerPoint PPT PresentationTRANSCRIPT
Towards Efficient Load Balancing in Structured P2P Systems
Yingwu Zhu, Yiming Hu
University of Cincinnati
Outline
• Motivation and Preliminaries
• Load balancing scheme
• Evaluation
Why Load Balancing?• Structured P2P systems, e.g., Chord,Pastry
– Object IDs and Node IDs are produced by using a uniform hash function.
– Results in O(log N) load imbalance, in the number of objects stored at each node.
• Skewed distribution of node capacity– Nodes may carry loads proportional to their
capacities.
• Other problems: different object sizes, non-uniform dist. of object IDs.
Virtual Servers (VS)• First introduced in Chord/CFS.
• A VS is responsible for a contiguous region of the ID space.
• A node can host multiple VSs.
Chord Ring
Node A
Node C
Node B
Virtual Sever Reassignment• Virtual server is the basic unit of load movement, allowing load
to be transferred between nodes.
• L – Load, T – Target Load.
T=15
Chord Ring
Heavy
L=45
L=41
L=3Node C
Node B
Node A
30
20 11
3
10
15T=50
T=35
Virtual Sever Reassignment• Virtual server is the basic unit of load movement, allowing load
to be transferred between nodes.
• L – Load, T – Target Load.
T=15
Chord Ring
Heavy
L=45
L=41
L=3Node C
Node B
Node A
30
20 11
3
10
15T=50
T=35
Virtual Sever Reassignment• Virtual server is the basic unit of load movement, allowing load to
be transferred between nodes.• L – Load, T – Target Load.
Chord Ring
Node A
Node C
Node B
T=50
T=15
T=35
L=45
L=31
L=14
L=30
30
20 11
3
10
15
Advantages of Virtual Servers
• Flexible: load is moved in the unit of a virtual server.
• Simple: – VS movement is supported by all structured P2P
systems.– Simulated by a leave operation followed by a join
operation.
Current Load Balancing Solutions
• Some use the concept of virtual server
• However:– Either ignore the heterogeneity of node
capabilities.– Or transfer loads without considering proximity
relationships between nodes.– Or both.
Goals
• Goals:– To maintain each node’s load less than its target
load (maximum load a node is willing to take).– High capacity nodes take more loads.– Load balancing is performed in proximity-aware
manner, to minimize the overhead of load movement (bandwidth usage) and allow more efficient and fast load balancing.
• Load: depends on the particular P2P systems.– E.g., storage, network bandwidth, and CPU cycles.
Assumptions
• Nodes in system are cooperative.
• Only one bottlenecked resource, e.g., storage or network bandwidth.
• The load of each virtual server is stable over the timescale when load balancing is performed.
Overview of Design
• Step1: Load balancing information (LBI) aggregation, e.g., load and capacity info.
• Step2: Node classification. E.g., heavy nodes, light nodes, neutral nodes.
• Step3: Virtual server assignment (VSA).
• Step4: Virtual server transferring (VST).
• Proximity-aware load balancing– VSA is proximity-aware.
LBI Aggregation and Node Classification• Rely on a fully decentralized, self-repairing, and fault-tolerant K-nary
tree built on top of a DHT (distributed hash table). • Each K-nary tree node is planted in a DHT node.• <L, C, Lmin> represents the load, capacity and the minimum load of
virtual servers, respectively.
<12,10,2> <15,8,3> <20,10,5> <15,20,4>
<27,18,2> <35,30,4>
<62, 48, 2>
LBI Aggregation and Node Classification• Relying on a fully decentralized, self-repairing, and fault-tolerant K-
nary tree built on top of a DHT. • Each K-nary tree node is planted in a DHT node.
• <L, C, Lmin> represents the load, capacity, and the minimum load of virtual servers.
<12,10,2> <15,8,3> <20,10,5> <15,20,4>
<62, 48, 2>
<62, 48, 2> <62, 48, 2>
<62, 48, 2> <62, 48, 2> <62, 48, 2> <62, 48, 2>
Light
Heavy
Heavy
LightTi = (L/C+)*Ci
Virtual Server Assignment
H1 L1 H2 H3 Ln Ln+1 Hm Hm+1…
V11, V12 C1 V21 V31, V32 Cn Cn+1 Vm1, Vm2 Vm+1
VSA information VSA information
Rendezvous point: best-fit heuristics
Rendezvous point: best fit heuristics
Unpaired VSA information
Final rendezvous pointVS
A happens earlier betw
een logically closer nodes
Logically close
Virtual Server Assignment• DHT identifier space-based VSA:
– VSA happens earlier between logically closer nodes.– Proximity-ignorant, because logically close nodes in DHT do
NOT mean they are physically close together.
H1
L3
L2L4
L1
H2
[1] Nodes in same colors are
physically close to each other.
[2] H – heavy nodes, L – light nodes.
[3] Vi – virtual servers.V1
V2 V3
Proximity-Aware VSA• Nodes in same colors are physically close to each other.
• H – heavy node, L – light node, Vi – virtual server.
• VSs are assigned between physically close nodes.
H1
L3
L2L4
L1
H2
V1V2
V3
Proximity-Aware VSA
• Use landmark clustering to generate proximity information, e.g. landmark vectors.
• Use space-filling curves (e.g., Hilbert curve): Landmark vectors Hilbert numbers as DHT keys.
• Heavy nodes and light nodes each puts/maps their VSA info. into the underlying DHT with the resulting DHT keys: align physical closeness with logical closeness.
• Each virtual server independently reports the VSA info. which is mapped into its responsible region, rather than its node’s own VSA info.
Proximity-Aware Virtual Server Assignment
H1 L1 H2 H3 Ln Ln+1 Hm Hm+1…
V11, V12 C1 V21 V31, V32 Cn Cn+1 Vm1, Vm2 Vm+1
VSA information VSA information
Rendezvous point: best-fit heuristics
Rendezvous point: best fit heuristics
Unpaired VSA information
Final rendezvous point
Physically close
VS
A happens earlier betw
een physically closer nodes
Experimental Setup
• A K-nary tree built on top of a DHT (Chord), e.g., k=2, and 8, respectively.
• Two node capacity distributions:– Gnutella-like capacity profile, 5-level capacities.– Zipf-like capacity profile.
• Two load distributions of virtual servers:– Gaussian dist. and Pareto dist.
• Two transit-stub topologies (5,000 nodes):– “ts5k-large” and “ts5k-small”.
High Capacity Nodes Carry More Loads
Gaussian load distribution + Gnutella-like capacity profile
High Capacity Nodes Carry More Loads
Pareto load distribution + Zipf-like capacity profile
Proximity-Aware Load Balancing
CDF of Moved Load Distribution in ts5k-large
Gaussian load distribution and
Gnutella-like capacity profile
Pareto load distribution and
Zipf-like capacity profile
More loads are moved within shorter distances by proximity-aware load balancing.
Benefit of Proximity-Aware Scheme
• Load movement cost:
LM(d) denotes the load moved in the distance of d hops.
• Benefit:
• Results: – For ts5k-large: B = 37-65%
– For ts5k-small: B = 11-20%
Other Results
• Quantify the overhead of K-nary tree construction:– Link stress, node stress.
• The latencies of LBI aggregation and VSA, bound in O(logN) time.
• The effect of pairing threshold in rendezvous points.
Conclusions• Current load balancing approaches using virtual servers
have limitations:– Either ignore node capacity heterogeneity.– Or transfer loads without considering proximity relationships
between nodes.– Or both.
• Our solution:– A fully decentralized, self-repairing, and fault-tolerant K-nary
is built on top of DHTs for performing load balancing.– Nodes carry loads in proportion to their capacities.– The first work to address load balancing issue in a proximity-
aware manner, thereby minimizing the overhead of load movement and allowing more efficient load balancing.
Questions?