cs6321 query optimization over web services utkarsh kamesh jennifer rajeev shrivastava munagala...

CS632 1

Query Optimization Over Web Services

Utkarsh Kamesh Jennifer RajeevShrivastava Munagala Wisdom Motwani

Presented By Ajay Kumar Sarda

CS632 2

Motivation

Web services emerging as a popular standard for sharing data and functionality

Databases behind web services DBMS-like capabilities when data sources are

web services Need for query optimization for queries spanning

multiple web services

CS632 3

Motivating Example

A credit card company wants to send out mails for it’s new credit card offer.

I: Potential recipient names

WS1:name(n) credit rating (cr)

WS2:name(n) credit card number (ccn)

WS3:card number (ccn) payment history (ph)

One Possible execution is WS1,WS2,WS3

Is it optimal?

Challenges

Different response time of web services

Precedence constraints

Tradeoff between linear pipeline and parallelism

Parsing SOAP/XML headers overhead

Related Work Query optimization in the presence of limited access

patterns

Binding pattern R (Ab, Bf)

Annotated query plans in the search space,prunes invalid and non-viable plans

Starts with initial set S of plans containing only atomic

plans S is iteratively updated by adding new plans

obtained by combining plans from S using selection and join operations

CS632 6

Outline of the Talk

WSMS Preliminaries Query Optimization with and without precedence

constraints Data Chunking Experimental Evaluation Conclusion Future work

CS632 7

WSMS Architecture

CS632 8

Query Model

Web Service denoted as WS(Xbi,,Yf

i) Xi - Bound Attributes

Yi - Free Attributes

CS632 9

Query Model (Contd.)

CS632 10

Query Plans

CS632 11

Execution Model

Ti created for each web service

Ti takes input from join thread Ji

Ji joins the outputs of parents of WSi

Jout joins the outputs of all leaves web service.

CS632 12

Execution Model (Contd.)

CS632 13

Statistics

Per-tuple response time(Ci)

ci=1/r

i where r

i is maximum rate of at which results of

invocations can be obtained from Wsi

Depends on web service provisioning, network

conditions and load on the web service

Selectivity(Si)

Average number of returned tuples that remain unfiltered

after applying predicates

Si <=1 (selective) or Si > 1 (proliferative)

CS632 14

Bottleneck Cost Metric Query plan H Pi(H) -the set of predecessors of WSi in H R[S]-- the combined selectivity of all the web

services in S Every tuple in I input to plan H, the average number

of tuples that WSi needs to process is given by R[Pi(H)]

Average processing time required by WSi per original input tuple in I is is R[Pi(H)].Ci

Cost of the query plan H

max(R[Pi(H)].Ci)

CS632 15

Bottleneck Cost Metric (Contd.)

Plan 1 : max(2*I, 10*0.1*I, 5*0.5*I)=2.5 Plan 2 : max(2*I, 10*I, 5*5*I)=25 Plan 2 is 10 times slower than plan 1

CS632 16

Q.O without Precedence Constraints Lemma: “There exists an optimal plan that is a

linear ordering of the selective web services, i.e., has no parallel dispatch of data.”

Si

Q.O without Precedence Constraints

Lemma: “Let WS1, . . . , WSn be a plan with a linear ordering

of the selective web services. If ci > ci+1, then WSi and WSi+1

can be swapped without increasing the cost of the plan.”

Ci > Ci+1

FiCi

FiSiCi+1

Ci+1(Si, Ci)

FiSi+1CiFiCi+1

(Si+1, Ci+1) Ci

CS632 18

Q.O without Precedence Constraints(Contd.)

Theorem : “For selective web services with no precedence constraints, the optimal plan is a linear ordering of the web services by increasing response time, ignoring selectivity's.”

CS632 19

Q.O with Precedence Constraints Constructs the plan DAG H incrementally by

greedily adding to it one web service at a time Web service chosen should be the one that can

be added to H with minimum cost, and all of whose prerequisite web services have already been added to H

Mi -- the set of all web services that are prerequisites for WSi

CS632 20

Adding a Web Service to the Plan A partial plan H (bar) and add WSx

Compute the best cut Cx such that on placing edges from the web services in Cx to WSx, cost is minimized

PCx –set of all the web services in Cx and all the predecessors in H(bar)

Cost incurred by adding WSx is

Cost(WSx)=R[PCx]. Cx

CS632 21

Adding a Web Service (Contd.)• A variable Zi with every WSi, set to 1 if Wsi belongs

to PCx.• Optimal set PCx obtained by solving LP problem

CS632 22

Greedy Algorithm

CS632 23

Data Chunking Parsing SOAP/XML headers and network cost

overhead on web service call Pass tuples to a web service in chunks Response time of WSi depends on input chunk

size Ci(k) – Response time of WSi on a chunk of size

k A limit ki

max exists on max chunk size

CS632 24

Data Chunking (Contd.)

Query Optimizer must decide on optimal chunk size for each web service

“The optimal chunk size to be used by WSi is Ki* such that ci(Ki*)/Ki* is minimized”

Profiling combined with query processing for trying out various chunk sizes

Intermediate tuples between any two web services in the pipelined plan are buffered

CS632 25

Experimental Evaluation Total running time as metric Compare the plans produced by optimizer

against Parallel – Dispatch data in parallel SelOrder—Choose WS with lower selectivity

Compare the running time with and without chunking

Compare the WSMS cost against the slowest web service

CS632 26

Experimental Setup

WSMS prototype is multithreaded system in Java

Apache Axis tools for communicating with web services

Java Reflection Different costs by varying delays Different selectivities by rejecting tuple with

probability 1-Si

CS632 27

No Precedence Constraints

WS1,WS2,WS3,WS4

Selectivities set as 0.4,0.3,0.2,0.1

Range of cost c varied from [0.2,2] to [2,2]

Parallel – WS4 SelOrder – WS4

CS632 28

Precedence Constraints

WS1,WS2,WS3,WS4

WS1 < WS3,WS2 < WS4

Selectivities :

2,1,0.1,0.1

Uniform cost of

WS1,WS2,WS3 with WS4

varied from 0.4 to 2

CS632 29

Data Chunking

WS1,WS2,WS3,WS4

No precedence constraints

Uniform cost Selectivity set to 0.5 Web Services are

arranged in linear pipeline (Optimizer)

Equal chunk size

CS632 30

WSMS Cost Vs Bottleneck Cost No precedence

constraints Uniform web service

costs Selectivity set to 0.5 Web Services

arranged in linear pipeline

CS632 31

Future Work

Different input tuples to follow different plans Adaptive plans that changes with response

times Web Services with monetary costs Multiple web services for same data Profiling techniques that track response time and

selectivities Caching Techniques at WSMS

CS632 32

Conclusion

Web Service Management System Bottleneck cost – cost of pipelined plan Optimal pipelined plan respecting precedence

constraints Optimal chunk size

References

Query Optimization over Web ServicesU. Srivastava, J. Widom, K. Munagala, and R. Motwani

Query optimization in the presence of limited access patterns. In Proc. of ACM SIGMOD Conf. on Management of Data

CS632 34

Thank You!

cs6321 query optimization over web services utkarsh kamesh jennifer rajeev shrivastava munagala...

Documents