divide and conquer algorithms for pub/sub overlay design
Post on 26-Jan-2016
15 Views
Preview:
DESCRIPTION
TRANSCRIPT
MIDDLEWARE SYSTEMSRESEARCH GROUP
Divide and Conquer Algorithms for Pub/Sub Overlay Design
Chen Chen 1
joint work with Hans-Arno Jacobsen 1,2, Roman Vitenberg 3
1 Department of Electrical and Computer Engineering2 Department of Computer Science
University of Toronto
3 Department of InformaticsUniversity of Oslo
ICDCS’10 Genoa, Italy 1
MIDDLEWARE SYSTEMSRESEARCH GROUP
Example: Pub/Sub
Interests: boy
Interests: boy
Interests: girl
boy
girl
ICDCS’10 Genoa, Italy 2
MIDDLEWARE SYSTEMSRESEARCH GROUP
Pub/Sub
• A communication paradigm– Subscribers express their interests– Publishers disseminate messages
• Many applications and industry standards– Application integration, financial data dissemination, RSS feed distribution, business process management– WS Notifications, WS Eventing, OMGs’ Real-time Data Dissemination Service
• Topic-based pub/sub– TIBCO RV– Google’s GooPS
ICDCS’10 Genoa, Italy 3
MIDDLEWARE SYSTEMSRESEARCH GROUP Two components
in pub/sub implementationDesign of routing protocols
• The design of protocols so that publications and subscriptions are sent most efficiently across the overlay network.
• G. Li et al., ICDCS’08• M. Castro et al., JSAC’02
Construction of overlay• The construction of the
overlay topology such that network traffic is minimized.
• Chockler et al., PODC’07• Onus et al., INFOCOM’09
ICDCS’10 Genoa, Italy 4
MIDDLEWARE SYSTEMSRESEARCH GROUP
Desirable properties for overlays
• Low average node degree• Low fan-out of a node• Low diameter• Topic-connectivity• Efficiency to construct• Adaptability to churn• Ease of distributed implementation
ICDCS’10 Genoa, Italy 5
MIDDLEWARE SYSTEMSRESEARCH GROUP Our contributions
ICDCS’10 Genoa, Italy 6
Previous algorithm: GM
High running time cost
Full knowledge requirement
Centralized operation (difficult to decentralize)
No support for dynamic changes
Constructing from scratch only(No support for incremental addition)
Our algorithms
Low running time cost
Partial knowledge requirement
Centralized operation (easy to decentralize)
No direct support for dynamic changes
Constructing both from scratch and incrementally
MIDDLEWARE SYSTEMSRESEARCH GROUP Topic-connectivity
V5
{a,c}
V1
{b,c,d}
V2
{a}
{b,d}
V4
{a,b}
V3
V5
{a,c}
V2
{a}
V4
{a,b}
V1
{b,c,d}
{b,d}
V4
{a,b}
V3
An overlay G Suboverlay Ga istopic-connected
Suboverlay Gb isNOT topic-connected
ICDCS’10 Genoa, Italy 7
MIDDLEWARE SYSTEMSRESEARCH GROUP MinAvg-TCO problem
V5
V1
{b,c,d}
V2
{a}
{b,d}
V4
{a,b}
V3
TCO1 has 5 edges
{a,c}
V5
V1
{b,c,d}
V2
{a}
{b,d}
V4
{a,b}
V3
TCO2 has 10 edges
{a,c}
ICDCS’10 Genoa, Italy 8
MIDDLEWARE SYSTEMSRESEARCH GROUP MinAvg-TCO problem
• A high-quality overlay– Topic-connectivity– Total number of edges
• Input: – a set of nodes V, – a set of topics T, – the interest function Int
• MinAvg-TCO(V,T,Int) (optimization version)
Construct a TCO(V,T,Int,E) such that |E| is minimum.
• Avg-TCO(V,T,Int,k) (decision version)
Is there a TCO(V,T,Int,E) such that |E|=k?
• Theorem: MinAvg-TCO is NP-complete
V5
V1
{b,c,d}
V2
{a}
{b,d}V4{a,b} V3
{a,c}
ICDCS’10 Genoa, Italy 9
MIDDLEWARE SYSTEMSRESEARCH GROUP
Greedy-Merge (GM) algorithm
• Greedy: always making the choice that looks best at the moment
• GM for MinAvg-TCO:always adding an edge with maximum link contribution
• Running Time: O(|V|2|T|)• Approximation Ratio: O(log(|V||T|))
ICDCS’10 Genoa, Italy 10
MIDDLEWARE SYSTEMSRESEARCH GROUP Our contributions
ICDCS’10 Genoa, Italy 11
Previous algorithm: GM
High running time cost
Full knowledge requirement
Centralized operation (difficult to decentralize)
No support for dynamic changes
Construction from scratch only(No support for incremental addition)
Our algorithms
Low running time cost
Partial knowledge requirement
Centralized operation (easy to decentralize)
No direct support for dynamic changes
Construction both from scratch and incrementally
MIDDLEWARE SYSTEMSRESEARCH GROUP
TCO join problem
• Given p TCOs: TCOd (Vd,Td,Intd,Ed), d=1,..,p
• MinAvg-TCO-Join(V,T,Int,p) (optimization version)
Construct a TCO(V,T,Int,E) such that |E| is minimum
• Avg-TCO-Join(V,T,Int,p,k) (decision version)Is there a TCO(V,T,Int,E) such that |E|=k?
• MinAvg-TCO is a special case of MinAvg-TCO-Join:
Theorem: MinAvg-TCO-Join is NP-complete
ICDCS’10 Genoa, Italy 12
MIDDLEWARE SYSTEMSRESEARCH GROUP
Solving MinAvg-TCO-Join
• MinAvg-TCO-Join could be solved by GM,
but NOT practical:– Tear down all existing links– Rebuild the overlay from scratch using GM
• It is better to preserve all existing edges and only add edges incrementally.
ICDCS’10 Genoa, Italy 13
MIDDLEWARE SYSTEMSRESEARCH GROUP Bad case
for incremental addition of edges
ICDCS’10 Genoa, Italy
V2
V1
Vn
ViVn-1
TCO0 :
1 2
,
{ , ,..., }
{ |1 , }n
i j
V v v v
T t i j n
{ , 1,..., }iv ijT t j n
V2
V1
Vn
ViVn-1
Vall
Vall : interested in all topics in T
2( )n TCO1 :2( )n TCO2 : ( )n
Constructing incrementally Constructing from scratch
V2
V1
Vn
ViVn-1
Vall
14
MIDDLEWARE SYSTEMSRESEARCH GROUP
Naive Merge (NM) algorithm
GM algorithm
• Input: (V,T,Int)• Output: one TCO• Algorithm:- Start with an empty edge
set;- Always add an edge with
maximum link contribution.
• Running time:
NM algorithm
• Input: (Vd,Td,Intd,Ed), d=1,...,p
• Output: one TCO• Algorithm:- Start with existing internal-TCO
links;- Always add a cross-TCO edge with
maximum link contribution.
• Running time:
NM is based on the same greedy heuristic as GM.
1
(| | | || |)p p
i ji j i
O T V V 2(| | | |)O V T
ICDCS’10 Genoa, Italy 15
MIDDLEWARE SYSTEMSRESEARCH GROUP Example of NM
V12
V0
{c}
V3
{d}V9
{a,b,c}
V6
{d} {a,b,c}
V8
V11V2
{a}V5{a,b,d}
V14
{b,c,d}
{a,b,c}
{a,b,d}
V13
V1
V4
{c}
V10
V7
{c}{a,c,d}
{c}
{a}
ICDCS’10 Genoa, Italy
Still a prohibitively high running time!!!
1
(| | | || |)p p
i ji j i
O T V V
16
MIDDLEWARE SYSTEMSRESEARCH GROUP Star set
V5
{a,c}
V1
{b,c,d}
V2
{a}
{b,d}
V4
{a,b}
V3
A topic-connected overlay{v3, v5} is a star set which covers all topics {a,b,c,d}
{v2, v3, v4} is not a star set; it only covers {a,b,d}
V5
V1
{b,c,d}
V2
{a}
{b,d}
V4
{a,b}
V3
V5
V1
{b,c,d}
V2
{a}
{b,d}
V4
{a,b}
V3
{a,c}
{a,c}
Given a TCO (V,T,Int,E)
A Star set S is a subset of V that covers all V’s topics.
ICDCS’10 Genoa, Italy 17
MIDDLEWARE SYSTEMSRESEARCH GROUP
Star set
• Star set nodes– Represents the interests of all the nodes– Can function as bridges to determine cross-TCO links
• Observation: minimal star sets tend to be substantially smaller than the total number of nodes.
• How to find a minimum star set S* for (V,T,Int)? – Equal to classic set cover problem: NP-complete– Could be approximated with a log approximation ratio
ICDCS’10 Genoa, Italy 18
MIDDLEWARE SYSTEMSRESEARCH GROUP
Star Merge (SM) algorithm
NM algorithm
• Input: (Vd,Td,Intd,Ed), d=1,..,p
• Output: one TCO• Algorithm:- Start with existing internal-TCO
links;- // Do nothing; - Always add a cross-TCO edge
with maximum link contribution.
SM algorithm
• Input: (Vd,Td,Intd,Ed), d=1,..,p
• Output: one TCO• Algorithm:- Start with existing internal-TCO
links;- Find a star set for each sub-
TCO;- Always add a cross-Star edge
with maximum link contribution.
ICDCS’10 Genoa, Italy 19
MIDDLEWARE SYSTEMSRESEARCH GROUP Example of SM
V12
V0
{c}
V6
{d}V9 {a,b,c
}V3
{d} {a,b,c}
V8
V11V2
{a}V5{a,b,d}
V14
{b,c,d}
{a,b,c}
{a,b,d}
V13
V1
V4
{c}
V10
V7
{c}{a,c,d}
{c}
{a}
ICDCS’10 Genoa, Italy
Running time largely improved because #stars << #nodes for most cases.
20
MIDDLEWARE SYSTEMSRESEARCH GROUP Divide and Conquer (DC)
for MinAvg-TCO• The number of nodes is a dominant factor for the
running time of the GM algorithm.• Divide-and-conquer
– Divide the MinAvg-TCO problem into several sub-overlay construction problems
– Conquer the sub-MinAvg-TCO problems independently and build sub-overlays into sub-TCOs
– Combine these sub-TCOs to one TCO
ICDCS’10 Genoa, Italy 21
MIDDLEWARE SYSTEMSRESEARCH GROUP
Design of DC algorithm
• How to divide the node set V:– Node clustering vs. random partitioning
– The number of partitions p
• The balance between conquer and combine– p = 1 (single partition): conquer only = GM
– p = |V| (each node is a partition): combine only = GM
• How to decentralize DC:– Note the DC algorithm as presented is fully centralized.
– However, it is possible to decentralize it.
• Theoretical analysis: not straightforward.
ICDCS’10 Genoa, Italy 22
MIDDLEWARE SYSTEMSRESEARCH GROUP Example of DC
V12
V0
{c}
V6
{d}V9 {a,b,c
}V3
{d} {a,b,c}
V8
V11V2
{a}V5{a,b,d}
V14
{b,c,d}
{a,b,c}
{a,b,d}
V13
V1
V4
{c}
V10
V7
{c}{a,c,d}
{c}
{a}
ICDCS’10 Genoa, Italy
- Divide overlay based on V- Conquer each sub-TCO by GM- Combine TCO into one by SM
23
MIDDLEWARE SYSTEMSRESEARCH GROUP
Experiment setting
• The number of nodes
|V| = 1000 ranging from 1000 to 8000
• The number of topics
|T| = 100 ranging from 100 to 1000
• The number of topics that subscribed by a node
NodeIntSize=20 ranging from 10 to 100
• Topic distribution uniform, zipf, exponential
ICDCS’10 Genoa, Italy 24
MIDDLEWARE SYSTEMSRESEARCH GROUP
Experiment design
• Evaluation: average node degree, running time
– Star Merge for MinAvg-TCO-Join– DC for MinAvg-TCO
• Random node partitioning
• The effects of the number of nodes
• The effects of the number of topics
• The effects of average subscription size of a node
• Comparison with RingPTRingPT is an algorithm that mimics the common practice of
building separate overlay for each topic.
ICDCS’10 Genoa, Italy 25
MIDDLEWARE SYSTEMSRESEARCH GROUP Star Merge
SM vs NM vs GM
ICDCS’10 Genoa, Italy 26
MIDDLEWARE SYSTEMSRESEARCH GROUP Divide-and-conquer
The effect of the number of nodes
ICDCS’10 Genoa, Italy 27
MIDDLEWARE SYSTEMSRESEARCH GROUP Divide-and-conquer
DC vs GM vs RingPT
ICDCS’10 Genoa, Italy 28
MIDDLEWARE SYSTEMSRESEARCH GROUP Algorithm summary
ICDCS’10 Genoa, Italy 29
Running time Quality of overlay #edges (avg node degree)
Required information
Potential to Decentralize
RingPT good poor full knowledge good
GM poor: O(|V|2|T|) good: O(log(|V||T|)) full knowledge poor
NM poor: 75% of GM good full knowledge good
SM good: 1.0% of GM good: ≤ 0.15 compared to GM partial knowledge good
DC good: 1.7% of GM good: ≤ 2.12 compared to GM partial knowledge good
MIDDLEWARE SYSTEMSRESEARCH GROUP
ICDCS’10 Genoa, Italy 30
MIDDLEWARE SYSTEMSRESEARCH GROUP
Minimal Number of Links
• A typical pub/sub system combines a number of protocols, many of which maintaining per-link state– A node must constantly monitor the availability of
each of its neighbors (heartbeats and keep-alive state)
– If the links are maintained using TCP, there is the cost of connection state for each link
– The more links there are, the fewer topics can be routed over each individual link, thereby diminishing cross-topic aggregation benefits
– If sequential-diff-based compression scheme is used, there is an extra cost associated with a history table
top related