management of uncertainty in publish/subscribe systems
DESCRIPTION
Management of Uncertainty in Publish/Subscribe Systems. Haifeng Liu. Department of Computer Sceince University of Toronto. AMGN=58. Publications. Publisher. Publisher. IBM=84. ORCL=12. JNJ=58. HON=24. INTC=19. MSFT=27. Subscriptions: IBM > 85 ORCL < 10 JNJ > 60. Notification. - PowerPoint PPT PresentationTRANSCRIPT
Management of Uncertainty in Publish/Subscribe Systems
Haifeng Liu
Department of Computer Sceince
University of Toronto
Publish/Subscribe Model
Publisher Publisher
Subscriber Subscriber
NotificationNotification
Stock markets
NYSENASDAQ
TSX
Subscriptions:IBM > 85ORCL < 10JNJ > 60
IBM=84
MSFT=27 INTC=19 JNJ=58ORCL=12
HON=24
AMGN=58 Publications
Subscriptions
Broker Network
Applications Enabled by Publish/Subscribe
• Selective information dissemination• Information Filtering on the Internet• Location-based services• Workflow management• Intra-enterprise process automation• Logistics and supply chain
management• Enterprise application integration• Network monitoring and (distributed)
system management
Types of Uncertainties• Lack of information
– Buy a cheap car
• Imprecision– Sensor data: temperature 15~20ºC,
– Location: location (x,y) location t+1(x’,y’)
• Semantics– Synonyms: vehicle vs. automobile– Class taxonomy: CD player vs. electronics– Different expression: 5 years experience vs. graduated
in 2001
Problem: manage uncertainties, imprecision and semantics in publish/subscribe system
Agenda• Distributed Publish/Subscribe Model and
Content-based Routing• Uncertainties in Publish/Subscribe• Research Challenges• Approximate P/S Model• Graph-structured Model• Current Status• Research Plan
Publish/Subscribe Messages
• Advertisement (ad)– publication patterns used by publishers to
announce the set of publications they are going to publish
– E.g. { (stock, any), (price, any) }
• Subscription (sub)– User interest specification– E.g. (stock = “yahoo”) & ( price ≤ $35)
• Publication (pub)– Information, data, event– E.g. { (stock, “yahoo”), (price, $32.79) }
Content-based Routing
Advertising
Publisher
Publisher
…
Distributed Overlay
Broker Network Subscriber
Subscriber
…
Advertisement
*Adopted from SIENA, Gryphon, REBECA and Hermes
Content-based Routing
Subscribing
Publisher
Publisher
…
Distributed Overlay
Broker Network Subscriber
Subscriber
…
Subscription
*Adopted from SIENA, Gryphon, REBECA and Hermes
Content-based Routing
Publishing
Publisher
Publisher
…
Distributed Overlay
Broker Network Subscriber
Subscriber
…
Publication
*Adopted from SIENA, Gryphon, REBECA and Hermes
Subscription Forwarding I
Covering optimization
Publisher
Publisher
…
Distributed Overlay
Broker Network Subscriber
Subscriber
…
S1: (car=Honda) & (price <= $30K)
S2: (car=Honda) & (price <= $25K)
S1 covers S2 s1
*Adopted from SIENA, Gryphon, REBECA and Hermes
S2
P: {(car = Honda), (price,$20K)}
Subscription Forwarding II
Merging optimization
Publisher
Publisher
…
Distributed Overlay
Broker Network Subscriber
Subscriber
…
*Adopted from SIENA, Gryphon, REBECA and Hermes
S1
S2
S’
S1: (car=Honda) & (price ≤ $30K)
S2: (car=Toyota) & (price ≤ $25K)
S’ : (car = any) & (price ≤ $30K)
P: {(car = Honda), (price,$20K)}
Publish/Subscribe Router• Forwarding of advertisements
– Via flooding• Forwarding of subscriptions
– Forward along reverse ad path• Matching of ad and sub (Intersecting)
– Optimizations• Covering/merging of subs
• Forwarding of publications– Forward along reverse sub path
• Matching of sub and pub
Uncertainties in Distributed Publish/Subscribe System
• Messages – uncertain subscription– uncertain publication
• Relations– Between sub and pub– Between sub and sub
• Result– Return top K matches
} representation: modeling
} computation:
Matching
Covering
Merging
} aggregation: ranking
Research Challenges
• Develop a publish/subscribe model to express uncertainties/semantics in publications and subscriptions
• Model approximate matching and semantic matching
• Model approximate covering/merging and semantic covering/merging
• Scalability to large number of subscribers and high publishing rate
• Model– Sub: fuzzy set– Pub: possibility distribution
• Matching– Possibility measure
– Necessity measure
• Ranking – “min” or “product” for conjunction– “max” or “plus” for disjunction
Approximate Matching Model
Graph-structured Model
• Model– Pub: directed graph – Sub: directed graph
pattern– Semantic: ontology
• Matching– Pattern graph maps to
data graph if the topology (structure) of the two graphs matches and all variable constraints (literal and ontology) are satisfied
• RankingPAPER17
PublicationAcademic Publication Jacobsen’s
Publications
Report
Proceedings
WWW VLDB
PAPER17
“Arno Jacobsen”
AUTHORCONFERENCE
SIGMOD
“California”
LOCATION
“2001”
YEAR
Current Status• Work to date
– Develop an approximate p/s model to express uncertainties and an efficient algorithm to do approximate matching
– Develop a covering and merging optimizations for approximate content-based routing
– Develop a graph-based p/s architecture applied to the dissemination of RDF metadata (including RSS)
– Develop two novel algorithms (covering and merging) for creation of a distributed content-based routing network for graph-structured data.
Comments from Previous Meeting
• Probability model• Qualitative similarity measure• Validate our results
– Real data set– Interactive evaluation
Research Plan I
• Membership Function Mining– Get a real data set– “Learn” the membership function
• Clustering: K-means, DBscan• Regression: neural network
• Semantic Matching and Routing Computation– Matching on ontology– Covering on ontology– Merging on ontology
Research Plan II
• Design an experiment to validate the mining results
• Design a method to combine possibility measure and necessity measure for ranking
• Push thresholds down the matching plan to increase the efficiency of matching algorithm
• Use probabilities as an alternative to model uncertainties and imprecision