podc 2007 © 2007 ibm corporation constructing scalable overlays for pub/sub with many topics...
TRANSCRIPT
PODC 2007 © 2007 IBM Corporation
Constructing Scalable Constructing Scalable Overlays for Pub/Sub With Overlays for Pub/Sub With
Many TopicsMany Topics
Problems, Algorithms, and EvaluationProblems, Algorithms, and Evaluation
G. Chockler, R. Melamed, Y. TockG. Chockler, R. Melamed, Y. Tock, IBM Haifa Research LabR. VitenbergR. Vitenberg, University of Oslo
© 2007 IBM Corporation
Publish/Subscribe (Pub/Sub)Publish/Subscribe (Pub/Sub)
N1
Subscription(N1)={B,C,D}N2
{A,B,C,E,}
N3
{A,D}
N4
{A,B,X}
N5
{A,X}Message BusMessage Bus
Publish(M1, A)
M1
M1
M1
© 2007 IBM Corporation
Scalability of Pub/SubScalability of Pub/Sub Most traditional pub/sub systems are geared Most traditional pub/sub systems are geared
towards small scale deploymenttowards small scale deployment– E.g., Isis MDS, TIB, MQSeries, GryphonE.g., Isis MDS, TIB, MQSeries, Gryphon
New generation of applications…New generation of applications…– Large data centers: Amazon, Google, Yahoo, EBay,…Large data centers: Amazon, Google, Yahoo, EBay,…– RSS, feed/news readers, on-line stock trading and RSS, feed/news readers, on-line stock trading and
bankingbanking– Web 2.0, Second LifeWeb 2.0, Second Life
……drive dramatic growth in scaledrive dramatic growth in scale– 10,000s of nodes, 1000s of topics, Internet-wide 10,000s of nodes, 1000s of topics, Internet-wide
distributiondistribution Emerging systems address this trend using P2P Emerging systems address this trend using P2P
techniquestechniques
© 2007 IBM Corporation
Overlay-Based Pub/SubOverlay-Based Pub/Sub
N1
{B,C,D}N2
{A,B,C,E}
N3
{A,D}
{A,B,X}
N5
{A,X}
N4
(M1,
A)
(M1, A)
(M1, A)
(M1, A
)
(M1, A)
•SCRIBE•Corona •Feedtree •Sub-2-Sub •TERA•...
Relay
© 2007 IBM Corporation
Overlay Topologies for Overlay Topologies for Pub/SubPub/Sub
““Good”Good” overlay will allow for efficient overlay will allow for efficient and simple publication routingand simple publication routing– Small routing tables, low load on relays, Small routing tables, low load on relays, – low latencylow latency
Ideally, overlay isIdeally, overlay is topic-connected topic-connected: : i.e., one connected component for i.e., one connected component for each topic-induced sub-grapheach topic-induced sub-graph– Most existing implementations construct Most existing implementations construct
topic-connected overlaystopic-connected overlays
© 2007 IBM Corporation
Topic-ConnectivityTopic-Connectivity
Topics B,C,X,E Topics B,C,X,E are connectedare connected
Topics A and D Topics A and D are are disconnecteddisconnected
N1
{B,C,D}N2
{A,B,C,E}
N3
{A,D}
{A,B,X}
N5
{A,X}
N4
© 2007 IBM Corporation
Topic-Connectivity: Simple Topic-Connectivity: Simple SolutionSolution
N1
{B,C,D}N2
{A,B,C,E}
N3
{A,D}
{A,B,X}
N5
{A,X}
N4
Node degree grows linearly with Node degree grows linearly with the subscription sizethe subscription size Roughly twice as big as the average Roughly twice as big as the average
subscription size for rings/treessubscription size for rings/trees
© 2007 IBM Corporation
Scalability of the Simple Scalability of the Simple Solution Solution
Negative impact on performance due toNegative impact on performance due to– CPU load: neighbor monitoring, message CPU load: neighbor monitoring, message
processingprocessing– Connection maintenance and header overheadConnection maintenance and header overhead– Memory overhead: per-link state associated Memory overhead: per-link state associated
with routing and/or compression schemes with routing and/or compression schemes being used, etc.being used, etc.
Scalability barrier for large systems Scalability barrier for large systems offering a wide range of subscription offering a wide range of subscription choiceschoices Can we do better?Can we do better?
© 2007 IBM Corporation
The Min-TCO ProblemThe Min-TCO Problem
Minimum Topic-Connected Overlay Minimum Topic-Connected Overlay (Min-TCO)(Min-TCO) problem: problem:– For a set of nodes For a set of nodes VV, set of topics , set of topics TT, and , and
IInterest: V nterest: V T T {true, false} {true, false}– Construct a topic-connected overlay Construct a topic-connected overlay G G
with the minimum possible number of with the minimum possible number of edges (or average degree)edges (or average degree)
TCOTCO (decision version): (decision version): – Decide whether there is a topic-connected Decide whether there is a topic-connected
overlay consisting of overlay consisting of kk edges (for a given edges (for a given kk))
© 2007 IBM Corporation
Complexity of TCOComplexity of TCO
LemmaLemma:: TCO(V,T,Interest,k) TCO(V,T,Interest,k)NPNPProofProof: Topic connectivity is verifyable in polynomial time: Topic connectivity is verifyable in polynomial time
LemmaLemma:: TCO(V,T,Interest,k) TCO(V,T,Interest,k) is NP-hard is NP-hardProofProof: : 1.1. Define an auxiliary problem Define an auxiliary problem Single Node TCO (SN-TCO)Single Node TCO (SN-TCO)
which is to decide if there is a topic-connected overlay which is to decide if there is a topic-connected overlay in which the degree of single given node in which the degree of single given node d d
2.2. Set Cover is polynomially reducible to SN-TCOSet Cover is polynomially reducible to SN-TCO3.3. SN-TCO is polynomially reducible to TCOSN-TCO is polynomially reducible to TCO
TheoremTheorem: TCO : TCO is NP-completeis NP-complete
N5
{B,C,D}
N2
{A,B}
N3
{A,D}
{A,C}{A,B,C,D}
N4
N1
© 2007 IBM Corporation
Approximating Min-TCOApproximating Min-TCO
The idea: exploiting subscription overlapsThe idea: exploiting subscription overlaps– Connecting the nodes with overlapping Connecting the nodes with overlapping
interests improves connectivity of several interests improves connectivity of several topics at oncetopics at once
Greedy MergeGreedy Merge ( (GMGM) algorithm:) algorithm:– Start from a singleton connected component Start from a singleton connected component
for each for each (v, t) (v, t) V V T T– At each iteration: add an edge that reduces the At each iteration: add an edge that reduces the
number of connected components for the number of connected components for the biggest number of topicsbiggest number of topics
– Stop, once there is a single connected Stop, once there is a single connected component for each topiccomponent for each topic
© 2007 IBM Corporation
Greedy MergeGreedy Merge
N1
{B,C,D}
N2
{A,B,C,E}
N3
{A,D}
{A,B,X}
N5
{A,X}
N4
TopicTopic # of conn. # of conn. compscomps
AA 44
BB 33
CC 22
DD 22
XX 22
EE 11
© 2007 IBM Corporation
Greedy MergeGreedy Merge
N1
{B,C,D}
N2
{A,B,C,E}
N3
{A,D}
{A,B,X}
N5
{A,X}
N4
TopicTopic # of conn. # of conn. compscomps
AA 44
BB 22
CC 11
DD 22
XX 22
EE 11
© 2007 IBM Corporation
Greedy MergeGreedy Merge
N1
{B,C,D}
N2
{A,B,C,E}
N3
{A,D}
{A,B,X}
N5
{A,X}
N4
TopicTopic # of conn. # of conn. compscomps
AA 33
BB 11
CC 11
DD 22
XX 22
EE 11
© 2007 IBM Corporation
Greedy MergeGreedy Merge
N1
{B,C,D}
N2
{A,B,C,E}
N3
{A,D}
{A,B,X}
N5
{A,X}
N4
TopicTopic # of conn. # of conn. compscomps
AA 22
BB 11
CC 11
DD 22
XX 11
EE 11
© 2007 IBM Corporation
Greedy MergeGreedy Merge
N1
{B,C,D}
N2
{A,B,C,E}
N3
{A,D}
{A,B,X}
N5
{A,X}
N4
TopicTopic # of conn. # of conn. compscomps
AA 22
BB 11
CC 11
DD 11
XX 11
EE 11
© 2007 IBM Corporation
Greedy MergeGreedy Merge
N1
{B,C,D}
N2
{A,B,C,E}
N3
{A,D}
{A,B,X}
N5
{A,X}
N4
TopicTopic # of conn. # of conn. compscomps
AA 11
BB 11
CC 11
DD 11
XX 11
EE 11
Average degree of 2 vs. Average degree of 2 vs. almost 3 for ring-per-topic!almost 3 for ring-per-topic!
© 2007 IBM Corporation
GM Running TimeGM Running Time
O(|V|O(|V|44|T|)|T|)– At most At most |V||V|22 iterations iterations – AtAt most most |V| |V|22 edges inspected at each iteration edges inspected at each iteration– At most At most |T||T| steps to inspect an edge steps to inspect an edge
Can be optimized to run in Can be optimized to run in O(|V|O(|V|22 |T|)|T|)– For each For each e e V V V V, , weight(e) =weight(e) = the number of the number of
connected components merged by connected components merged by ee– At each iteration, output the heaviest edge and At each iteration, output the heaviest edge and
adjust the other edge weights accordinglyadjust the other edge weights accordingly– Stop once there are no more edges with weight > Stop once there are no more edges with weight >
00
© 2007 IBM Corporation
Approximability ResultsApproximability Results
LemmaLemma: : 1.1. The number of edges in the overlay The number of edges in the overlay
constructed by GM constructed by GM log(|V|log(|V||T|)|T|) OPT OPT
ProofProof: Similar to that of the approximation ratio of the greedy : Similar to that of the approximation ratio of the greedy algorithm for Set Coveralgorithm for Set Cover
2.2. There exists an input on which GM’s output There exists an input on which GM’s output meets this ratiomeets this ratio
TheoremTheorem: No algorithm can approximate Min-TCO : No algorithm can approximate Min-TCO within a constant factor (unless P=NP)within a constant factor (unless P=NP)
ProofProof: Existence of such an algorithm would imply existence of : Existence of such an algorithm would imply existence of the constant factor approximation for Set Cover which is known the constant factor approximation for Set Cover which is known to be impossible (unless P=NP)to be impossible (unless P=NP)
© 2007 IBM Corporation
More Overlay Design More Overlay Design ProblemsProblems
FilteringFiltering: Given an upper bound : Given an upper bound dd on the on the node degree, minimize the number of relays node degree, minimize the number of relays used to connect each topicused to connect each topic– Captures the cases when full topic-connectivity is Captures the cases when full topic-connectivity is
infeasible because of resource constraintsinfeasible because of resource constraints DiameterDiameter: Given an upper bound : Given an upper bound dd on the on the
node degree, minimize the diameter of each node degree, minimize the diameter of each topic in the overlaytopic in the overlay– Latency optimal routing under resource Latency optimal routing under resource
constraintsconstraints ……
© 2007 IBM Corporation
ConclusionsConclusions
Initiated formal study of the problem of Initiated formal study of the problem of designing efficient and scalable overlay designing efficient and scalable overlay topologies for pub/subtopologies for pub/sub
Defined a representative problem (Min-TCO) Defined a representative problem (Min-TCO) capturing the cost of constructing topic-capturing the cost of constructing topic-connected overlaysconnected overlays– NP-Completeness, polynomial approximation, NP-Completeness, polynomial approximation,
inapproximability resultsinapproximability results Empirical evaluation showed effectiveness of Empirical evaluation showed effectiveness of
our approximation algorithm on practical our approximation algorithm on practical inputs inputs
© 2007 IBM Corporation
Future DirectionsFuture Directions
Study dynamic caseStudy dynamic case Investigate other overlay design Investigate other overlay design
problemsproblems Study distributed caseStudy distributed case
– Partial knowledge of other node interestPartial knowledge of other node interest– Dynamically changing interest Dynamically changing interest
assignmentsassignments