carnegie mellon university complex queries in distributed publish- subscribe systems ashwin r....

22
Carnegie Mellon University Complex queries in distributed publish-subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Post on 20-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Carnegie Mellon University

Complex queries in distributed publish-subscribe systems

Ashwin R. Bharambe,

Justin Weisz and

Srinivasan Seshan

Page 2: Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Carnegie Mellon University

Outline

Publish-subscribe systems Subscription Languages Routing Protocols

MERCURY architecture Preliminary evaluation

Scalability Performance

Future work

Page 3: Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Carnegie Mellon University

Publish-subscribe systems

Challenges Subscription language - how to express “interests”? Routing mechanism - how is content routed and where

is it matched?

MatcherS P

?

PublicationSubscription

MatchedPublication

Page 4: Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Carnegie Mellon University

Subscription language

How to express interests? Channels or Subjects All content attributes

Operators? Exact matches Range queries Regular expressions

Subject = MATH

Name = ‘A’ & Age < 25

Price = 350$

300$ ≤ Price ≤ 350$

Name ~= Ash[win]*B*

Page 5: Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Carnegie Mellon University

MERCURY subscription language Subscription =

list of (type, attribute, rel-op, value) Can implement boolean expressions

(AND/ORs)

( int, price, LESS_THAN, 300 ) ( string, name, EQUALS, “Opensig” )

Publication = list of (type, attribute, value)

Page 6: Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Carnegie Mellon University

Example: Virtual reality

User

x ≥ 50x ≤ 150y ≥ 150y ≤ 250

Interests

x 100y 200

Events

(100,200)

(150,150)

(50,250)

Arena

Virtual World

Page 7: Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Carnegie Mellon University

Routing mechanism

Centralized? Easy to make publications “meet” subscriptions Single point of failure – not robust!

Distributed? Where are subscriptions stored? How do publications “meet” subscriptions? Broadcast-based solutions not scalable

Page 8: Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Carnegie Mellon University

Distributed routing - goals

Scalability is a key goal Flooding anything is bad, bad, bad… System should not have hot-spot in terms of:

Computational load – matching ) Subscriptions should be evenly distributed

Number of packets routed or received ) Publications should be evenly distributed

Yet – we should have low delivery delays!!

Page 9: Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Carnegie Mellon University

Previous work

Systems like Scribe use DHTs for scalability Why can’t we ? Exact matches vs. Range queries!

int x 1 int x 10hash

Subscription

int x = 9hash

Publication

?

How about generating 10 subscriptions ? Too many subscriptions Works for discrete-valued attributes only

Page 10: Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Carnegie Mellon University

Attribute Hubs

Divide range of an attribute into bins Each node responsible for range of attribute values

Hprice

[240, 320)

[80, 160)

[160, 240)[0, 80)

Attribute Hub

Hub-nodes connected through a circular overlay

Circle only for connectivity One hub per attribute Routing algo

compare value in content to my range

Page 11: Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Carnegie Mellon University

Routing illustrated

Send subscription to any one attribute hub Send publications to all attribute hubs

[0, 80)

[210, 320)

50 ≤ x ≤ 150150 ≤ y ≤ 250

x 100y 200

Hx

[240, 320)

[80, 160)

[160, 240)

Hy

[0, 105)

[105, 210)

Subscription

Publication

Rendezvous point

Page 12: Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Carnegie Mellon University

Efficient routing

Reduce number of hops Each hub-node maintains “small” number of

pointers to distant parts of the hub How to maintain these pointers?

Send ACKs for publication receipts Various caching policies determine the structure

of the pointer table e.g., LRU, Uniform-spacing, Exponential-spacing

Page 13: Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Carnegie Mellon University

Routing illustrated

[0, 80)

[240, 320)

[80, 160)

[160, 240)

Hx

x 100y 200

Publication

[105, 210)Hy

[210, 320)

[0, 105)

ACK

AC

K

Page 14: Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Carnegie Mellon University

Evaluation

Workload Experimental setup Metrics

Page 15: Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Carnegie Mellon University

Workload

One of our target apps multi-player games Model

Virtual world as square Subscriptions as rectangles around current

positions

Page 16: Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Carnegie Mellon University

Experimental setup

Player movements simulated using mobility models from ns-2

Two hubs – x and y co-ordinates Half the nodes in each hub Uniform partition of range

Page 17: Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Carnegie Mellon University

Metrics

Scalability metric load Number of publications routed by a node Averaged over time

Performance metric publication delivery delay Time between sending of a publication and its

receipt by all subscribers Averaged over all subscribers of a publication Averaged over all publications

Page 18: Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Carnegie Mellon University

Results: scalability

0

0.05

0.1

0.15

0.2

0.25

0.3

0.4 0.6 0.8 1 1.2 1.4 1.6Normalized load

Fra

ctio

n o

f n

od

es

100 nodes

50 nodes

20 nodes Ideal graph: delta function

Observed variation: ±12%

Page 19: Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Carnegie Mellon University

Results: performance

0

5

10

15

20

25

20 nodes 50 nodes 72 nodes 100nodes

200nodes

Dela

ys - in

secs

single

n-lru

n-uniform

nocache

graph-dia

Without caching: linear scaling

Caching reduces delays to near optimal

Workload effects ?

47.25

Cache size: log(n)

Page 20: Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Carnegie Mellon University

Conclusions

Expressive subscription language Decentralized architecture Scalability

Avoids flooding of subscriptions and publications – reduces network traffic

Distributes publications and subscriptions throughout the network – prevents swamping

Page 21: Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Carnegie Mellon University

Future Work

Load balancing Sensitive to data value

distribution Adapt ranges dynamically

according to the distribution

Affects pointer management, caching, etc.

x

Pr(

X=

x)

Page 22: Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan

Carnegie Mellon University

Future Work

Perform sensitivity analysis for different kinds of workloads

Generic API for building applications on top of MERCURY To be released soon

Build a full-fledged distributed Quake-II