an expressive and scalable xml event service
DESCRIPTION
TRANSCRIPT
An An ExpressiveExpressive and and ScalableScalable XML Event Service XML Event Service for Service-Oriented Computingfor Service-Oriented Computing
Yi Huang
Extreme! Computing LabDepartment of Computer Science
Indiana University
Publish/Subscribe Event ServicesPublish/Subscribe Event Services
Subscribers express interests, later notified of relevant data from publishers Event (brokering) services manage subscriptions and deliver events Enable loosely-coupled application Integration
2
Scalability for Pub/Sub systemsScalability for Pub/Sub systems
Scalability: The capability of a system to maintain QoS under an increased load when resources are added.
– Scale up (add resource to a single node)
– Scale out (add more nodes) Load Scalability: The ability for a distributed system to easily expand its
resource pool to accommodate heavier loads.
– Number of publishers, Number of consumers,Number of subscriptions, Message rate (peak, average), Message size, etc
Geographic scalability– Accommodate clients (publisher/consumers) from Internet
Administrative scalability– Enable different organizations to share a publish/subscribe
service
3
Challenge: Expressiveness vs. ScalabilityChallenge: Expressiveness vs. Scalability
Has been treated as trade-offs in Publish/Subscribe systems Topic-based subscriptions need least processing power=>Best scalabilityContent-based subscriptions (name-value pair)
– More expressive
– consumers can get exact messages they need
– Reduce unnecessary transmission on WAN
– Need more processing power at each broker =>Less scalable
– Available Solutions: covering subscription (SIENA); organize attributes (Gryphon)
XPath-based Subscriptions – Filtering XML messages based on message structure and message content,
e.g. /a/b[@x=“1”]– Very expensive to parse XML messages and evaluate against XPath
subscriptions– Create more challenges to scalability
4
Examples of Content-based Filtering
Personalized news delivery– All the sports news.
– All the articles written by John Smith.
– All the articles referring to the one whose document id is 1234.
– All the events that will take place in Denver.
System monitoring– All the log message with “Error” as status.
– All the log messages from the simulation service.
Stock quotes– IBM stock value >100
5
Applications of Event Brokering Services
Application integration Personalized news delivery Online auction Stock tickers Human resource management (Peoplesoft) Network and application monitoring …
6
Infrastructure to Support Scientific Research
‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it’
--John Taylor, Director General of Research Councils UK, Office of Science and Technology
Scientific research is becoming data centric– Unify theory, experiment, and simulation – Using data exploration and data mining– Data captured by instruments – Data generated by simulations– Processed by software– Scientist analyzes databases/files
Need International Grid Infrastructure to support e-Science.
7
Services and SOA
Service: A “web server” that runs an application for you.– You send it requests (XML documents) and it processes the
information and send replies (notifications) when it is done.
Web service: Use standard web technology to create services, e.g. XML,SOAP, WSDL.
Service Oriented Architecture (SOA)– Promotes a pluggable framework to add new features and to
virtualize access to resources
Combining SOA and event service enables loosely-coupled system
Service1. Service Request
Bussiness Logic
2. Run Weather Simulation (WRF)
3. Publish notifications
8
My Contributions in PhD
Participated in SOA design & implementation for eScience infrastructure in LEAD project
Created an Internet-scale Web-service-based Event service for SOA: The nerve system in the LEAD SOA– Made it reliable and scalable
– Integrated with various services
– Tools for management and debugging
Research problem addressed: Expressiveness vs. Scalability in XML Publish/Subscribe systems, especially the administrative scalability.
9
OutlineOutline
Achieving Scalable and Expressive XML Event Services
Event Service in Service-oriented Scientific Workflow (LEAD project)
10
Achieving Scalable and Expressive XML Event Services
Separation of concerns with a layered model– WS-Messenger local broker for XML and WS
– OpenPS broker network
Filtering Result Summary (FRS)– Reduce complicated event content matching to simple
string matching in the routing broker
Load Scalability in LAN
Achieved with – Message queue
– Load sharing among servers
– Load balancing
– Cache
Achieved in most commercial systems.– We use existing approaches in our systems.
– Not our research focus.
12
Difficulties for Achieving Geographical Scalability
Long network latency, limited bandwidth
– RTT (from Indiana to Germany) is over 200 ms
Communication in WAN is inherently unreliable and virtually always point-to-point, whereas LAN communication is generally highly reliable and based on broadcasting
Centralized solutions lead to a waste of network resources and degrade system performance
Deployment, monitoring, updating and debugging Firewalls
– HTTP usually does not keep connections open
– Cost in initial three-way handshake on TCP
13
Single Broker vs. Broker NetworkSingle Broker vs. Broker Network
InternetPublisher
Local consumer
Site A
Consumer 1
Consumer 3
Consumer 2
Consumer n
Single broker
...
Site B
Msg
Msg
Msg
Msg
Msg
Msg
InternetPublisher
Local consumer
Site A
Consumer 1
Consumer 3
Consumer 2
Consumer n
Broker A
...
Site B
Msg
Msg
Msg
Msg
Msg
Msg
Msg
Broker B
Internet
End client
Broker with router
Router only
14
Duplicate transmissions of same messages
Take long time to get acknowledgements
Share the transmission bandwidth
Brokers take advantage of locality
Single Broker 2-Broker network
multi-Broker network
Filtering-based RoutingFiltering-based Routing
Subscription propagation: subscriptions are propagated to every broker leaving state along the path
Notification delivery: matching notification follow (backwards) the path set by subscriptions
Reducing subscription propagation– Avoiding subscription propagation when a filter
including the subscription has been already forwarded
15
Difficulties for Achieving Administrative Scalability
Hindered by conflicting resource usage, management, and security policies in multiple, independent administrative domains
– E.g. Cannot trust a third-party service to inspect message content for content-based filtering
16
Demands for Internet-scale Sharable ServicesDemands for Internet-scale Sharable Services
Emerging with the concept of using Internet as computing platform. Software-as-service Help developers build the next generation of composite
applications without maintaining infrastructure Require administrative scalability Amazon Simple Queue Service Amazon Simple Storage Service Microsoft BizTalk Service (instead of BizTalk server)
– newly announced 4/24/2007
– Added simple pub/sub support in 5/2/2007, allows multiple clients to subscribe to a service and receive notifications.
– Not production-level yet
17
Related Work on Messaging MiddlewareRelated Work on Messaging Middleware
Topic-based Pub/Sub
Content -basedsimple Pub/Sub
Content-basedXML Pub/Sub
Geographic+Load
Load
Expressiveness
TIBCO, IBM MQ Series, Microsoft Biztalk Server,
SCRIBE,Bayeux, Echo
OpenJMS, activeMQ, Oracle Advanced
Queuing,Apache ServiceMix,
WS-Messenger
SIENA, Gryphon, HERMES, JEDI
Le Subscribe, Xlyeme, Elvin
ONYX, Xroute, NaradaBrokering,
Xfilter, Yfilter, Xaos, XSQ, Xtrie, IndexFilter,
XMLTLK,Apache ServiceMix,
WS-Messenger
Microsoft BizTalk Service,
OpenPSOpenPS
Amazon Simple Queuing Service OpenPS
Administrative+Geographic
+Load
Scalability
IBM MQ Series , Microsoft MQ ,Fiorano MQ,RabbitMQ
OpenJMS, activeMQ, Oracle Advanced
Queuing
Topic-based Queue
Limitation of Existing Content-based Pub/Sub ServicesLimitation of Existing Content-based Pub/Sub Services
Lack of administrative scalability (Not sharable) – Homogeneity: Require same broker to be deployed across the
Internet in every organization
– Need to trust brokers: Brokers need to inspect message content to make routing decision
– Not interoperable: Clients and brokers need same implementation, either C/C++ or Java library .
=>Not economical: Expensive to deploy and maintain global network for one project
Gap: – few projects can afford to deploy and maintain a global network,
– many projects need global messaging system. How to create a sharable Pub/Sub service?
19
Our Solution: Separation of ConcernsOur Solution: Separation of Concerns
20
Layered Publish/subscribe (LPS) Model 5-layered model built on top of TCP/IP model
LPS ModelLPS Model
InternetInternet
Event Application Layer
Event Transport Layer
Event Distribution Layer
Event Filtering Layer
Event Transformation Layer
Event Application Layer
Event Transport Layer
Event Distribution Layer
Event Filtering Layer
Event Transformation Layer
Event Transport Layer
Event Distribution Layer
Publisher Broker Consumer Broker
Intermediary Broker
Message Delivery
Message Delivery
publisher consumer
Broker Network
21
Traditional approaches in distributed content-based filtering include all 5 layers in intermediary brokers.
Separation of ConcernsSeparation of Concerns
Separate brokers into local brokers and routing brokers Local brokers (WS-Messenger or other broker) are set up locally
– Users can trust to inspect message content
– Acts as local agents A global network of routing brokers (OpenPS network) can interact
with local brokers using Web service interfaces.– A brokering service for local brokers
– Handle global message dissemination
– Sharable by many local brokers
– Acts as postal offices Make the routing brokers as simple as possible
– Improve scalability and interoperability.
– Limitation of target deployment platform: PlanetLab global testbed• Shared virtual machines with CPU usage >95%.
22
Hierarchical Broker NetworkHierarchical Broker Network
23
Analogous to postal service networks Economies of scale
Filtering Result Summary (FRS)Filtering Result Summary (FRS)
Local brokers attach FRS to messages Observation 1: Topic-based message filtering in the
intermediary broker can be reduced to 1-to-1 string matching.
Observation 2: Content-based message filtering in the intermediary broker can be reduced to any-to-any string matching.
Simplify complicated message matching to string matching.
24
Web service-based Event Service (WS-Messenger Local Event Service)
Focus on meeting the needs of local services ExpressivenessWeb servicesAddressing local load scalability
Y. Huang, A. Slominski, et al., "WS-Messenger: A Web Services based Messaging System for Service-Oriented Grid Computing," 6th IEEE International Symposium on
Cluster Computing and the Grid (CCGrid06).
New Requirements by SOA
Services are autonomous => may need format transformation
Integrate heterogeneous services and work with existing event buses– Java, C++, C#, Windows, Linux, Unix, OpenJMS, activeMQ,…
XML processing XPath-based filtering Internet-scale
– Services from different organizations in different locations
Shared global event service for SOA
26
Evolution of Pub/Sub SpecificationEvolution of Pub/Sub Specification
CORBA (Common Object Request Broker Architecture) Event Service (3/1995)
CORBA Notification Service (6/1997) Java Message Service (JMS) (4/2002) OGSI-Notification (6/2003) WS-Notification (1/2004, OASIS standard 10/06) WS-Eventing (1/2004, submitted to W3C in 3/06)
Y. Huang and D. Gannon, "A Comparative Study of Web Services-based Event Notification Specifications," Proc. of Workshop on Web Services-based Grid Applications (WSGA), 2006
27
WS-MessengerWS-Messenger Broker Broker
Implemented both WS-Eventing specification and WS-Notification specification
Mediation approach to reconcile conflicts between WS-Eventing and WS-Notification
A generic interface to wrap up existing local JMS messaging systems.
– Adapters created: OpenJMS, ActiveMQ, NaradaBrokering
Efficient XML processing– Over 4 times faster than Globus Toolkit implementation on XML message
processing
Support very expressive subscriptions
28Y. Huang, A. Slominski, et al., "WS-Messenger: A Web Services based Messaging System for Service-Oriented Grid Computing," (CCGrid06).
Architecture Architecture
29
OpenPS Broker Network
Addressing Geographic scalability and Administrative scalabilityCreating a sharable global event service without losing expressiveness
Design Goals of OpenPSDesign Goals of OpenPS
Sharable by multiple unrelated projects Can integrate heterogeneous services and work with existing event
buses– Java, C++, C#, Windows, Linux, Unix, OpenJMS, activeMQ,… Use Web services as “magic bullet”
Internet-scale– Can integrate services across organizations around the world– Support collaboration among brokers from different vendors
XML or simple message as message payload Reliable,
– critical communication foundation Scalable
– Expect high-load in short time and increasing load over time High performance
– Near real-time message delivery
31
ImplementationImplementation
32
Filtering Result Summary (FRS)Filtering Result Summary (FRS)
33
Currently, simply use topics and XPath expressions as FRS
Embedded in HTTP header
Other FRS formats are allowed e.g. encrypted
string, unique numbers, etc.
Performance ComparisonPerformance Comparison
34
2500 subscriptions (200 unique) for 50 consumers FRS achieved efficient message matching. Future work: improve subscription scalability (FRS may get
too long).
Evaluation on PlanetLabEvaluation on PlanetLab
PlanetLab is a global research network for developing, deploying and accessing planetary-scale services.
Consists of 804 nodes at 391 sites
35
http://www.telegeography.com/products/map_internet/index.php
DeploymentDeployment
Deployed to over 300 nodes
Overlay network follows physical network
Created scripts to automatically start, update, and stop services
Latency (LAN)Latency (LAN)
37
• Local broker processing dominates (message size: 7079 bytes)
Latency (WAN)
38
• Delay in OpenPS nodes dominates (message size: 7079 bytes)
• Caused by MTU in IP (1.5KB)
Throughput (WAN)
39
POnode
publisher
COnode consumer
bleu
agni
hunk
69.2 ms
OpenPS nodes
Publisher local brokers
(PLBs)
planetlab06.mpi-sws.mpg.de
planetlab05.mpi-sws.mpg.de
Consumer local brokers
(CLB)
exodus
part
linbox3
publisher
publisher
publisherplanetlab02.mpi-sws.mpg.de
planetlab04.mpi-sws.mpg.de
hunk
rogue
rogue
United States Germany
Checkpoint for Receiving rate
Checkpoint for Consumer receiving rate
Checkpoint for Publishing rate
Throughput vs. Thread numbers
Throughput (WAN)Throughput (WAN)
40
• Can achieve about 200-300 messages/sec with 600 XPath subscriptions using 7079-byte XML message using 400 threads
• Still have room to improve, e.g. wrapping-up messages
Evaluation Results SummaryEvaluation Results Summary
Latency– In LAN, local broker processing dominates latency.
– In WAN, delay in OpenPS nodes dominates latency due to network latency
Throughput– Used thread pool to compensate latency for higher throughput
– Can achieve about 200-300 msg/s with 600 XPath subscriptions using 7079-byte XML message using 400 threads (From Indiana to Germany)
• Compared to about 3 msg/s using local broker alone
• Compared to about 20 msg/s processing capacity for PL nodes
– Still has room to improve, e.g. wrapping-up messages
41
Event Service in Service-oriented Scientific Workflow (LEAD project)
The LEAD ProjectThe LEAD Project
43create infrastructure for better predictions of severe weather
The LEAD ProjectThe LEAD Project
44•$11.25M over 5 years since 2003
Analysis/Assimilation
Quality ControlRetrieval of Unobserved
QuantitiesCreation of Gridded Fields
Prediction/Detection
PCs to Teraflop Systems
Product Generation, Display,
Dissemination
End Users
NWSPrivate Companies
Students
Traditional MethodologyTraditional Methodology
STATIC OBSERVATIONS
Radar Data
Mobile Mesonets
Surface Observations
Upper-Air Balloons
Commercial Aircraft
Geostationary and Polar Orbiting Satellite
Wind Profilers
GPS Satellites
45
The Process is Entirely Serialand Static (Pre-Scheduled): No Response to the Weather!
The Process is Entirely Serialand Static (Pre-Scheduled): No Response to the Weather!
Major Paradigm Shift: CASA NETRAD adaptive Major Paradigm Shift: CASA NETRAD adaptive Doppler Radars. Doppler Radars.
Analysis/Assimilation
Quality ControlRetrieval of Unobserved
QuantitiesCreation of Gridded Fields
Prediction/Detection
PCs to Teraflop Systems
Product Generation, Display,
Dissemination
End Users
NWSPrivate Companies
Students
The LEAD Vision: Adaptive CyberinfrastructureThe LEAD Vision: Adaptive Cyberinfrastructure
DYNAMIC OBSERVATIONS
47
Models and Algorithms Driving Sensors
The CS challenge: Build cyberinfrastructure services that The CS challenge: Build cyberinfrastructure services that provide adaptability, scalability, availability, useability, and provide adaptability, scalability, availability, useability, and real-time response. real-time response.
The Service ArchitectureThe Service Architecture
48
Use service – to promote a pluggable
framework
– to add new features
– to virtualize access to resources
A Closer look at LEAD Service ArchitectureA Closer look at LEAD Service Architecture
49
Data Storage
Application services Compute Engine
User’s Browser
PortalserverPortalserver
DataCatalogservice
DataCatalogservice
MyLEAD UserMetadatacatalog
MyLEAD UserMetadatacatalog
MyLEAD Agent
service
MyLEAD Agent
service DataManagement
Service
DataManagement
Service
WorkflowEngine
WorkflowEngine
Workflow graph
ProvenanceCollection
service
ProvenanceCollection
service
Event Notification Bus
AppfactoryApp
factory
Tied together with Internet-scale event bus.
Example LEAD WorkflowExample LEAD Workflow
Event Service in LEAD WorkflowsEvent Service in LEAD Workflows
51
Y. Huang, A. Slominski, et al., "WS-Messenger: A Web Services based Messaging System for Service-Oriented Grid Computing," 6th IEEE International Symposium on
Cluster Computing and the Grid (CCGrid06).
Notification ViewsNotification Views
52
Real TimeMonitor
MyLEAD
Techniques for FirewallsTechniques for Firewalls
53
Problem: How to deliver to event consumers behind the firewalls?
Solution: Use MessageBox Web service
Tools: Subscription Manager Interface Tools: Subscription Manager Interface
Check and delete subscriptions on different brokers from one simple interface
54
Tools: Event Notification Message Viewer Tools: Event Notification Message Viewer
55
Debug Monitor Firewall
ConclusionsConclusions
Extended existing state-of-the-art on Publish/Subscribe systems to Web services and SOA– Mediation among competing WS specifications
– Achieved load scalability, geographic scalability and administrative scalability
Reduced messaging filtering to 1-to-1 (topic-based) and any-to-any (content-based) matching
A layered model for Internet-scale sharable Publish/Subscribe Service– Separation of concerns and economies of scale
Applied to real-world SOA project (LEAD workflow)
56
Competing Specification Problem
Expressiveness vs. Scalability problem
Dynamic Workflow OrchestrationEvent Application Layer
Event Transport LayerEvent Distribution Layer
Event Filtering LayerEvent Transformation Layer
Thank You!Thank You!
57