management of uncertainty in publish/subscribe systems

20
Management of Uncertainty in Publish/Subscribe Systems Haifeng Liu Department of Computer Sceince University of Toronto

Upload: melba

Post on 06-Jan-2016

20 views

Category:

Documents


0 download

DESCRIPTION

Management of Uncertainty in Publish/Subscribe Systems. Haifeng Liu. Department of Computer Sceince University of Toronto. AMGN=58. Publications. Publisher. Publisher. IBM=84. ORCL=12. JNJ=58. HON=24. INTC=19. MSFT=27. Subscriptions: IBM > 85 ORCL < 10 JNJ > 60. Notification. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Management of Uncertainty in Publish/Subscribe Systems

Management of Uncertainty in Publish/Subscribe Systems

Haifeng Liu

Department of Computer Sceince

University of Toronto

Page 2: Management of Uncertainty in Publish/Subscribe Systems

Publish/Subscribe Model

Publisher Publisher

Subscriber Subscriber

NotificationNotification

Stock markets

NYSENASDAQ

TSX

Subscriptions:IBM > 85ORCL < 10JNJ > 60

IBM=84

MSFT=27 INTC=19 JNJ=58ORCL=12

HON=24

AMGN=58 Publications

Subscriptions

Broker Network

Page 3: Management of Uncertainty in Publish/Subscribe Systems

Applications Enabled by Publish/Subscribe

• Selective information dissemination• Information Filtering on the Internet• Location-based services• Workflow management• Intra-enterprise process automation• Logistics and supply chain

management• Enterprise application integration• Network monitoring and (distributed)

system management

Page 4: Management of Uncertainty in Publish/Subscribe Systems

Types of Uncertainties• Lack of information

– Buy a cheap car

• Imprecision– Sensor data: temperature 15~20ºC,

– Location: location (x,y) location t+1(x’,y’)

• Semantics– Synonyms: vehicle vs. automobile– Class taxonomy: CD player vs. electronics– Different expression: 5 years experience vs. graduated

in 2001

Problem: manage uncertainties, imprecision and semantics in publish/subscribe system

Page 5: Management of Uncertainty in Publish/Subscribe Systems

Agenda• Distributed Publish/Subscribe Model and

Content-based Routing• Uncertainties in Publish/Subscribe• Research Challenges• Approximate P/S Model• Graph-structured Model• Current Status• Research Plan

Page 6: Management of Uncertainty in Publish/Subscribe Systems

Publish/Subscribe Messages

• Advertisement (ad)– publication patterns used by publishers to

announce the set of publications they are going to publish

– E.g. { (stock, any), (price, any) }

• Subscription (sub)– User interest specification– E.g. (stock = “yahoo”) & ( price ≤ $35)

• Publication (pub)– Information, data, event– E.g. { (stock, “yahoo”), (price, $32.79) }

Page 7: Management of Uncertainty in Publish/Subscribe Systems

Content-based Routing

Advertising

Publisher

Publisher

Distributed Overlay

Broker Network Subscriber

Subscriber

Advertisement

*Adopted from SIENA, Gryphon, REBECA and Hermes

Page 8: Management of Uncertainty in Publish/Subscribe Systems

Content-based Routing

Subscribing

Publisher

Publisher

Distributed Overlay

Broker Network Subscriber

Subscriber

Subscription

*Adopted from SIENA, Gryphon, REBECA and Hermes

Page 9: Management of Uncertainty in Publish/Subscribe Systems

Content-based Routing

Publishing

Publisher

Publisher

Distributed Overlay

Broker Network Subscriber

Subscriber

Publication

*Adopted from SIENA, Gryphon, REBECA and Hermes

Page 10: Management of Uncertainty in Publish/Subscribe Systems

Subscription Forwarding I

Covering optimization

Publisher

Publisher

Distributed Overlay

Broker Network Subscriber

Subscriber

S1: (car=Honda) & (price <= $30K)

S2: (car=Honda) & (price <= $25K)

S1 covers S2 s1

*Adopted from SIENA, Gryphon, REBECA and Hermes

S2

P: {(car = Honda), (price,$20K)}

Page 11: Management of Uncertainty in Publish/Subscribe Systems

Subscription Forwarding II

Merging optimization

Publisher

Publisher

Distributed Overlay

Broker Network Subscriber

Subscriber

*Adopted from SIENA, Gryphon, REBECA and Hermes

S1

S2

S’

S1: (car=Honda) & (price ≤ $30K)

S2: (car=Toyota) & (price ≤ $25K)

S’ : (car = any) & (price ≤ $30K)

P: {(car = Honda), (price,$20K)}

Page 12: Management of Uncertainty in Publish/Subscribe Systems

Publish/Subscribe Router• Forwarding of advertisements

– Via flooding• Forwarding of subscriptions

– Forward along reverse ad path• Matching of ad and sub (Intersecting)

– Optimizations• Covering/merging of subs

• Forwarding of publications– Forward along reverse sub path

• Matching of sub and pub

Page 13: Management of Uncertainty in Publish/Subscribe Systems

Uncertainties in Distributed Publish/Subscribe System

• Messages – uncertain subscription– uncertain publication

• Relations– Between sub and pub– Between sub and sub

• Result– Return top K matches

} representation: modeling

} computation:

Matching

Covering

Merging

} aggregation: ranking

Page 14: Management of Uncertainty in Publish/Subscribe Systems

Research Challenges

• Develop a publish/subscribe model to express uncertainties/semantics in publications and subscriptions

• Model approximate matching and semantic matching

• Model approximate covering/merging and semantic covering/merging

• Scalability to large number of subscribers and high publishing rate

Page 15: Management of Uncertainty in Publish/Subscribe Systems

• Model– Sub: fuzzy set– Pub: possibility distribution

• Matching– Possibility measure

– Necessity measure

• Ranking – “min” or “product” for conjunction– “max” or “plus” for disjunction

Approximate Matching Model

Page 16: Management of Uncertainty in Publish/Subscribe Systems

Graph-structured Model

• Model– Pub: directed graph – Sub: directed graph

pattern– Semantic: ontology

• Matching– Pattern graph maps to

data graph if the topology (structure) of the two graphs matches and all variable constraints (literal and ontology) are satisfied

• RankingPAPER17

PublicationAcademic Publication Jacobsen’s

Publications

Report

Proceedings

WWW VLDB

PAPER17

“Arno Jacobsen”

AUTHORCONFERENCE

SIGMOD

“California”

LOCATION

“2001”

YEAR

Page 17: Management of Uncertainty in Publish/Subscribe Systems

Current Status• Work to date

– Develop an approximate p/s model to express uncertainties and an efficient algorithm to do approximate matching

– Develop a covering and merging optimizations for approximate content-based routing

– Develop a graph-based p/s architecture applied to the dissemination of RDF metadata (including RSS)

– Develop two novel algorithms (covering and merging) for creation of a distributed content-based routing network for graph-structured data.

Page 18: Management of Uncertainty in Publish/Subscribe Systems

Comments from Previous Meeting

• Probability model• Qualitative similarity measure• Validate our results

– Real data set– Interactive evaluation

Page 19: Management of Uncertainty in Publish/Subscribe Systems

Research Plan I

• Membership Function Mining– Get a real data set– “Learn” the membership function

• Clustering: K-means, DBscan• Regression: neural network

• Semantic Matching and Routing Computation– Matching on ontology– Covering on ontology– Merging on ontology

Page 20: Management of Uncertainty in Publish/Subscribe Systems

Research Plan II

• Design an experiment to validate the mining results

• Design a method to combine possibility measure and necessity measure for ranking

• Push thresholds down the matching plan to increase the efficiency of matching algorithm

• Use probabilities as an alternative to model uncertainties and imprecision