cs6321 query optimization over web services utkarsh kamesh jennifer rajeev shrivastava munagala...
TRANSCRIPT
CS632 1
Query Optimization Over Web Services
Utkarsh Kamesh Jennifer RajeevShrivastava Munagala Wisdom Motwani
Presented By Ajay Kumar Sarda
CS632 2
Motivation
Web services emerging as a popular standard for sharing data and functionality
Databases behind web services DBMS-like capabilities when data sources are
web services Need for query optimization for queries spanning
multiple web services
CS632 3
Motivating Example
A credit card company wants to send out mails for it’s new credit card offer.
I: Potential recipient names
WS1:name(n) credit rating (cr)
WS2:name(n) credit card number (ccn)
WS3:card number (ccn) payment history (ph)
One Possible execution is WS1,WS2,WS3
Is it optimal?
Challenges
Different response time of web services
Precedence constraints
Tradeoff between linear pipeline and parallelism
Parsing SOAP/XML headers overhead
Related Work Query optimization in the presence of limited access
patterns
Binding pattern R (Ab, Bf)
Annotated query plans in the search space,prunes invalid and non-viable plans
Starts with initial set S of plans containing only atomic
plans S is iteratively updated by adding new plans
obtained by combining plans from S using selection and join operations
CS632 6
Outline of the Talk
WSMS Preliminaries Query Optimization with and without precedence
constraints Data Chunking Experimental Evaluation Conclusion Future work
CS632 7
WSMS Architecture
CS632 8
Query Model
Web Service denoted as WS(Xbi,,Yf
i) Xi - Bound Attributes
Yi - Free Attributes
CS632 9
Query Model (Contd.)
CS632 10
Query Plans
CS632 11
Execution Model
Ti created for each web service
Ti takes input from join thread Ji
Ji joins the outputs of parents of WSi
Jout joins the outputs of all leaves web service.
CS632 12
Execution Model (Contd.)
CS632 13
Statistics
Per-tuple response time(Ci)
ci=1/r
i where r
i is maximum rate of at which results of
invocations can be obtained from Wsi
Depends on web service provisioning, network
conditions and load on the web service
Selectivity(Si)
Average number of returned tuples that remain unfiltered
after applying predicates
Si <=1 (selective) or Si > 1 (proliferative)
CS632 14
Bottleneck Cost Metric Query plan H Pi(H) -the set of predecessors of WSi in H R[S]-- the combined selectivity of all the web
services in S Every tuple in I input to plan H, the average number
of tuples that WSi needs to process is given by R[Pi(H)]
Average processing time required by WSi per original input tuple in I is is R[Pi(H)].Ci
Cost of the query plan H
max(R[Pi(H)].Ci)
CS632 15
Bottleneck Cost Metric (Contd.)
Plan 1 : max(2*I, 10*0.1*I, 5*0.5*I)=2.5 Plan 2 : max(2*I, 10*I, 5*5*I)=25 Plan 2 is 10 times slower than plan 1
CS632 16
Q.O without Precedence Constraints Lemma: “There exists an optimal plan that is a
linear ordering of the selective web services, i.e., has no parallel dispatch of data.”
Si
Q.O without Precedence Constraints
Lemma: “Let WS1, . . . , WSn be a plan with a linear ordering
of the selective web services. If ci > ci+1, then WSi and WSi+1
can be swapped without increasing the cost of the plan.”
Ci > Ci+1
FiCi
FiSiCi+1
Ci+1(Si, Ci)
FiSi+1CiFiCi+1
(Si+1, Ci+1) Ci
CS632 18
Q.O without Precedence Constraints(Contd.)
Theorem : “For selective web services with no precedence constraints, the optimal plan is a linear ordering of the web services by increasing response time, ignoring selectivity's.”
CS632 19
Q.O with Precedence Constraints Constructs the plan DAG H incrementally by
greedily adding to it one web service at a time Web service chosen should be the one that can
be added to H with minimum cost, and all of whose prerequisite web services have already been added to H
Mi -- the set of all web services that are prerequisites for WSi
CS632 20
Adding a Web Service to the Plan A partial plan H (bar) and add WSx
Compute the best cut Cx such that on placing edges from the web services in Cx to WSx, cost is minimized
PCx –set of all the web services in Cx and all the predecessors in H(bar)
Cost incurred by adding WSx is
Cost(WSx)=R[PCx]. Cx
CS632 21
Adding a Web Service (Contd.)• A variable Zi with every WSi, set to 1 if Wsi belongs
to PCx.• Optimal set PCx obtained by solving LP problem
CS632 22
Greedy Algorithm
CS632 23
Data Chunking Parsing SOAP/XML headers and network cost
overhead on web service call Pass tuples to a web service in chunks Response time of WSi depends on input chunk
size Ci(k) – Response time of WSi on a chunk of size
k A limit ki
max exists on max chunk size
CS632 24
Data Chunking (Contd.)
Query Optimizer must decide on optimal chunk size for each web service
“The optimal chunk size to be used by WSi is Ki* such that ci(Ki*)/Ki* is minimized”
Profiling combined with query processing for trying out various chunk sizes
Intermediate tuples between any two web services in the pipelined plan are buffered
CS632 25
Experimental Evaluation Total running time as metric Compare the plans produced by optimizer
against Parallel – Dispatch data in parallel SelOrder—Choose WS with lower selectivity
Compare the running time with and without chunking
Compare the WSMS cost against the slowest web service
CS632 26
Experimental Setup
WSMS prototype is multithreaded system in Java
Apache Axis tools for communicating with web services
Java Reflection Different costs by varying delays Different selectivities by rejecting tuple with
probability 1-Si
CS632 27
No Precedence Constraints
WS1,WS2,WS3,WS4
Selectivities set as 0.4,0.3,0.2,0.1
Range of cost c varied from [0.2,2] to [2,2]
Parallel – WS4 SelOrder – WS4
CS632 28
Precedence Constraints
WS1,WS2,WS3,WS4
WS1 < WS3,WS2 < WS4
Selectivities :
2,1,0.1,0.1
Uniform cost of
WS1,WS2,WS3 with WS4
varied from 0.4 to 2
CS632 29
Data Chunking
WS1,WS2,WS3,WS4
No precedence constraints
Uniform cost Selectivity set to 0.5 Web Services are
arranged in linear pipeline (Optimizer)
Equal chunk size
CS632 30
WSMS Cost Vs Bottleneck Cost No precedence
constraints Uniform web service
costs Selectivity set to 0.5 Web Services
arranged in linear pipeline
CS632 31
Future Work
Different input tuples to follow different plans Adaptive plans that changes with response
times Web Services with monetary costs Multiple web services for same data Profiling techniques that track response time and
selectivities Caching Techniques at WSMS
CS632 32
Conclusion
Web Service Management System Bottleneck cost – cost of pipelined plan Optimal pipelined plan respecting precedence
constraints Optimal chunk size
References
Query Optimization over Web ServicesU. Srivastava, J. Widom, K. Munagala, and R. Motwani
Query optimization in the presence of limited access patterns. In Proc. of ACM SIGMOD Conf. on Management of Data
CS632 34
Thank You!