bluedove pub/sub - winlab · 2011-05-11 · existing pub/sub systems centralized enterprise pub/sub...
TRANSCRIPT
IBM Research
© 2011 IBM Corporation
BlueDove Pub/sub–A Scalable and Elastic Publish/Subscribe Service
Fan YeIBM T. J. Watson Research Center
In collaboration with Ming Li, Minkyong Kim, Han Chen, Hui Lei
IBM Research
© 2011 IBM Corporation2
Emerging Smarter Planet Applications
Traffic Camera
Smartphone
Speed Sensor
Smarter Transportation Smarter Grid
“How is the traffic from my current location to my destination?”
Current location &
speed
Requires highly efficient messaging service
IBM Research
© 2011 IBM Corporation3
Challenges to Messaging Service Large number of clients, high information rate—Scalability
Dynamic work load—Elasticity
Personalized information—Real time fine-grained filtering
Non-interrupt service—High availability
IBM Research
© 2011 IBM Corporation4
General Publish/Subscribe Messaging Model
PublishersSubscribers
Subscription Event
Event
Store subscriptions
Match events to subscriptions
Push events to subscribers
Decoupling of information producer/consumer Fine-grained event filtering
Pub/Sub Server
IBM Research
© 2011 IBM Corporation5
Existing Pub/Sub Systems Centralized enterprise pub/sub
– Multiple fully connected, dedicated cluster servers
Peer-to-peer public pub/sub– Large number of partially-
connected, global-distributed machines
– Meghdoot, Pastrystring, Hermes
IBM Research
© 2011 IBM Corporation6
Centralized Enterprise Pub/Sub Support enterprise applications
– Relatively static clients– Predictable work load– Strict requirement on response time
Design– Centralized and relatively static servers– Fully connected topology– Subscriptions are replicated to all servers
Challenges– Hard to provision for highly dynamic work load– Heavy administration when system scale is large
Publisher
Subscriber1
Subscriber2
Subscriber3
Subscriber4
Sub1Sub2Sub3Sub4
…
Sub1Sub2Sub3Sub4
…
Sub1Sub2Sub3Sub4
…
Sub1Sub2Sub3Sub4
…
IBM Research
© 2011 IBM Corporation7
Peer-to-Peer Public Pub/Sub Support public applications
– Clients join or leave frequently– Highly dynamic work load– Flexible requirements on response time and false rate
Architecture– Publishers and subscribers also act as pub/sub servers– Servers form a self-maintained logical overlay– Subscriptions/publications are mapped to the overlay– Routing are done in multi-hop manner
Challenges– Bottlenecks due to skewed distribution– Performance issues due to dynamics and heterogeneity of
servers – Large network delay
0
10
40
75
120
90
Subscriber
Sub : {Speed: 30~40}
Publisher
event : {Speed: 35}
(0~10]
(10~40]
(40~75]
(75~90]
(90~120]
(120~0]Sub: {Speed: 30~40}
IBM Research
© 2011 IBM Corporation8
BlueDove Pub/Sub Cloud Centralized enterprise pub/sub
Peer-to-peer public pub/sub
IBM Research
© 2011 IBM Corporation9
Data Center
BlueDove Pub/Sub Cloud Architecture
Gossip-based Single-hop Overlay
Performance-oriented Event Forwarding
Publisher Subscriber Publisher Subscriber
Pub/Sub Servers
Clients
Multi-dimension Subscription Assignment
Sub
SubSub
Event
Event
Event
IBM Research
© 2011 IBM Corporation10
Gossip-based Single-hop Overlay
Each server maintains global view of the overlay– The contact information of each other server
– Enables one-hop forwarding
Overlay changes are propagated by gossip– Each server periodically exchanges info with a
few random servers
– Any state change is propagated to the whole network in log(N) rounds
Advantages– Auto, low-overhead maintenance
– Low network delay
Overlay TableID IP
2 192.168.1.2
4 192.168.1.4
1
2
3
4
6
5
Overlay TableID IP
1 192.168.1.1
3 192.168.1.3
Overlay TableID IP
2 192.168.1.2
3 192.168.1.3
4 192.168.1.4
Overlay TableID IP
1 192.168.1.1
2 192.168.1.2
3 192.168.1.3
IBM Research
© 2011 IBM Corporation11
Multi-dimensional Subscription Assignment Multiple dimensions are mapped to the overlay
simultaneously– Each dimension is partitioned into sections– Each server maintains a section of each dimension– Partition information is propagated by gossip and
known by all servers
Each subscription is mapped in all K dimensions and stored by K servers simultaneously– Each server maintains a separate index for
subscriptions assigned on each dimension
Advantages– Redundancy of servers– Allows us to exploit the skewness in distribution of
subscriptions
Speed
Locationx
1
2
4
3
Speed 10~20Longitude 40o~41o
L4L3L2L1
S1 S2 S3 S4
S3
L2
L4S1
L1
S2
L3
S4
IBM Research
© 2011 IBM Corporation12
Speed
Locationx
L4L3L2L1
S1 S2 S3 S4
Performance Oriented Event Forwarding Select the most efficient dimension to match events
– Servers exchange per dimension load information through gossip– Different load metris: #subscriptions, average response
time, cpu load, …– Each event has K candidate servers, one for each dimension– Select the server with the best performance metric
Policy 1: minimum number of subscriptions– Does not work well in heterogeneous environment
Policy 2: minimum average response time– May not adapt fast enough when server load changes
quickly Policy 3: minimum predicted response time
– Match_Time * (QLenold + T * (Event_Rate - Match_Rate))– Adapts to heterogeneous environment and fast changing
server load Advantage
– Exploits the skewness of subscriptions– Load balancing
1 sub
3 subs
Speed 25Longitude 33o
1
2
4
3S3
L2
L4S1
L1
S2
L3
S4
IBM Research
© 2011 IBM Corporation13
BlueDove Pub/Sub Server
Dispatcher
Pub/S
ub Server
Implementation
Gossiper (Cassandra)
One-hop OverlayLoad Monitor Bootstrap
Matching Engine
Subscriptions
Subscription Forward Partitioner
Event Forward
Dimension Select
Server Map
Load Collector
Load Predictor
IBM Research
© 2011 IBM Corporation14
Testbed and Experiment Setup Testbed
– 25 RC2 machines • 20 pub/sub servers; 3 dispatchers; 2 to simulate clients
Simulators simulate publishers and subscribers– Each subscription or publication has four attributes– Each attribute in subscriptions represents a range of 200
• The center of the range follow a Normal distribution cropped of range size 1000, standard deviation of 250, mean at 125, 375, 625, 875 for dimension 1-4
– Each attribute in events represents a point in [0,1000]• The point follows a uniform random distribution
Metrics– Response time: the time period from when a publication is generated to when it
is received by corresponding subscribers – Saturation event rate: the highest event rate that the system can handle
without publication queue overflow
IBM Research
© 2011 IBM Corporation15
Response time behavior to event rate Response time series before saturation: 40K subscriptions, 20 servers, 100K event/sec
0
5000
10000
15000
20000
0 200000 400000 600000 800000 1000000
Time(milli-seconds)
Res
pons
e Ti
me(
mill
i-se
cond
s)
Response time series after saturation: 40K subscriptions, 20 servers, 150K event/sec
050
100150200250300350400
0 200000 400000 600000 800000 1000000
Time(milli-seconds)
Res
pons
e Ti
me(
mill
i-sec
onds
IBM Research
© 2011 IBM Corporation16
Impact of event rate to average response time Compare BlueDove to Full-replication scheme
– 20 servers
– 40,000 subscriptions
1
10
100
1000
10000
100000
1000000
0 50 100 150Event Rate(1,000 events per second)
Res
pons
e Ti
me(
milli
-sec
onds
)
full-repbluedove
IBM Research
© 2011 IBM Corporation17
Scalability to event rate Fix number of subscriptions (40K), increase number of servers
0
20
40
60
80
100
120
5 10 15 20Number of Servers
Sat
urat
ion
Eve
nt R
ate
(eve
nts
per
mill
i-sec
onds
)
bluedovefull-repp2p
IBM Research
© 2011 IBM Corporation18
Scalability to number of subscriptions Fix event rate; increase number of servers
0
10000
20000
30000
40000
50000
60000
5 10 15 20Number of Servers
Num
ber o
f Sub
scrip
tions
IBM Research
© 2011 IBM Corporation19
CPU load distribution CPU load across different servers
– 20 servers; 40,000 subscriptions; 100,000 events per second for BlueDove, 27,000 events per second for p2p scheme
0
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Server ID
CP
U L
oad
bluedovep2p
IBM Research
© 2011 IBM Corporation20
Matching load distribution Number of events (109) matched on different servers
– 20 servers; 40,000 subscriptions; 100,000 events per second for BlueDove, 27,000 events per second for p2p scheme
0
0.5
1
1.5
2
2.5
3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Server ID
Num
ber o
f Sub
scrip
tions bluedove
p2p
IBM Research
© 2011 IBM Corporation21
Different Event Dispatching Policies Compare random, subscription-number-based, response-
time-based and prediction-based policies
0
1000
2000
3000
4000
5000
6000
7000
Prediction Rsp-Time Sub-Num
Satu
ratio
n Ev
ent R
ate
IBM Research
© 2011 IBM Corporation22
Elasticity upon increasing load Event rate increases continuously. Initially there are 5 nodes
6 7 8
6,000 events/sec
600 events/sec
Time(MilliSecond)
IBM Research
© 2011 IBM Corporation23
Impact of the number of partition dimensions
0
20
40
60
80
100
120
1 2 3 4
Number of Dimensions
Satu
ratio
n Ev
ent R
ate(
even
ts
per m
illi-s
econ
d) Fix the number of subscriptions and change the number of
partition dimensions– 20 servers; 40,000 subscriptions
IBM Research
© 2011 IBM Corporation24
Impact of the skewness of the distribution of subscriptions Change the standard deviation of subscription distribution
– 20 servers; 40,000 subscriptions
60
70
80
90
100
110
120
200 400 600 800 1000
Standard Deviation
Sat
urat
ion
Eve
nt R
ate(
even
ts p
erm
illi-s
econ
d)
IBM Research
© 2011 IBM Corporation25
Summary
A cloud based publish/subscribe service provides an answer to the challenge of interconnecting numerous information producers and consumers in the Smarter Planet
BlueDove exploits the skewness in data distribution and difference across multiple dimensions to achieve high performance
Evaluation has shown its linear capacity scaling and agile adaptation to changes in workload
IBM Research
© 2011 IBM Corporation26
Thank you!
Questions?
IBM Research
© 2011 IBM Corporation27
Impact of the skewness of the distribution of events Change the number of event dimensions that are skewed in the
same way as their corresponding subscription dimensions– 20 servers; 40,000 subscriptions
405060708090
100110120
0 1 2 3 4Number of Skewed Dimensions
Satu
ratio
n Ev
ent R
ate(
even
tspe
r milli
-sec
ond)
IBM Research
© 2011 IBM Corporation28
Future Work
Explicit load balancing – Move subscriptions from overloaded nodes to less-loaded ones
Integration with BlueDove queue– BlueDove queue takes the responsibility of pushing publications
to subscribers
Persistence– “outsource” data storage to a cloud-based storage (e.g.
Cassandra)
IBM Research
© 2011 IBM Corporation29
Cross-data-center replication of subscriptions
Implicit replication– Each subscription is already replicated to multiple servers
– With high probability at least two servers are in different data centers
Explicit replication– If all replicas are in the same data center, then explicitly
make a copy in another data center