objectglobe open, secure, and qos-enhanced distributed query processing

46
ObjectGlobe Open, Secure, and QoS- enhanced Distributed Query Processing Donald Kossmann Technical University of Munich http://www3.in.tum.de Joint work with Alfons Kemper (Passau) and others

Upload: mead

Post on 05-Jan-2016

24 views

Category:

Documents


2 download

DESCRIPTION

ObjectGlobe Open, Secure, and QoS-enhanced Distributed Query Processing. Donald Kossmann Technical University of Munich http://www3.in.tum.de Joint work with Alfons Kemper (Passau) and others. Outline. Background The ObjectGlobe Lookup Service (Security Aspects) QoS Management Summary. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

ObjectGlobe Open, Secure, and QoS-enhanced

Distributed Query Processing

Donald Kossmann

Technical University of Munich

http://www3.in.tum.de

Joint work with Alfons Kemper (Passau) and others

Page 2: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Outline

• Background

• The ObjectGlobe Lookup Service

• (Security Aspects)

• QoS Management

• Summary

Page 3: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Query Processing on the Internet

• Web servers, relational databases on the Web: centralized or limited query capabilities

• Middleware Systems:a great deal of data shipping

• Goals of ObjectGlobe: – integrate any kind of data– integrate any kind of query processing capabilities– bring query processing capabilities to the data

Page 4: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Middleware for Query Processing

R

... ... ...

... ... ...

... ... ...S

Data-Provider A

T... ... ...... ... ...... ... ...

Data-Provider B

wrap_S

thumbnail

thumbnail

wrap_S

Use

r-d

efin

ed o

per

ato

rs

R... ... ...... ... ...

... ... ...

T... ... ...... ... ...... ... ...

Heavy data shipping

Page 5: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

wrap_S

thumbnail

wrap_S

thumbnail

Fct-Provider

R... ... ...... ... ...

... ... ...S

Data-Provider AT

... ... ...

... ... ...

... ... ...

Data-Provider B

Open Query Processing (Step 1)

Load

functions

Page 6: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

T... ... ...... ... ...

... ... ...

wrap_S

thumbnail

wrap_S

thumbnail

Fct-Provider

R... ... ...... ... ...

... ... ...S

Data-Provider AData-Provider B

Open Query Processing (Step 2)

Load

functions

Page 7: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Traveling from M to UCB

flights rental cars

Selection SelectionRoutenplaner

Route

Top N

Function Provider

Data Provider Data Provider

Cycle Provider

Page 8: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Open QP with ObjectGlobe

• Create an open marketplace for– data providers– cycle providers– function providers

• Requirements– wrappers exist for all data of data providers– JVM runs on all cycle providers– fixed interface for operators of function providers

Page 9: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Scenarios

• Free Internet: everything is free and available for everybody

• Restricted Internet: charge according to usage, quality, and timeliness; restrictions (e.g., age)

• Intranet: everything is free and available for „insiders“

• Outsourcing: charge for certain services (e.g., backup, business analyses)

Page 10: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Challenges

• Lookup Service– Find the relevant services

• Security – Protect data and cycle providers from bad code

• Quality of Service– What you pay is what you get

Page 11: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

ObjectGlobe Lookup-Service

Lookup-Service

Parser OptimizerExecution

Engine

Application /User

Browse, Search

Authorisation,... Statistics, Cost Information, ...

Provider

Register

Page 12: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Description of Services

• Providers register RDF or XML documents

• There is a pre-defined schema to describe services

• Data Providers:– Theme (e.g., Hotel)– Attributes (e.g., rate, location, category)– Access paths and wrappers– Characteristics of the server (e.g., availability)– Information for authorization – Statistics– ...

Page 13: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

• Function Provider:– Signature (e.g., foo(int, int) -> int)– Information for authorization– Hardware requirements (e.g., 30 MB main memory)– Size of Java byte code– ...

• Cycle Provider:– Hardware (e.g., 1 GB main memory)– Location and network connections / bandwidth– Information for authorization– ...

Page 14: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

XML Description of a Data Provider

<DataProvider> <id> 4711 </id> <theme> <name> Hotel </name> <desc> All hotels you ever want </desc> </theme> <Attribute>

<topic> city </topic><type> string </type>

</Attribute>...

Page 15: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Lookup Query

• Data Providers for Hotels that return the City and Rate of each hotel

search DataProvider dselect d.uniqueId, d.attr.*where d.theme.name = „hotel“ and d.attr.?.topic = „city“ and d.attr.?.topic = „rate“

Page 16: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Three-tier Architecture

• Local Lookup-Servers– Keep copies of meta-data of services that are relevant

for a particular organization or subsidary– Evaluate Lookup requests for that organization– Relevance is determined by subscription rules (queries)

• Public Lookup-Servers (Backbone)– Store all (public) meta-data– Store subscription rules of local Lookup-Servers– Notify local Lookup-Servers of changes – Users can browse in the public info of the backbone

Page 17: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Three-tier Architecture

PublicLookup-Server

PublicLookup-Server

Local LS Local LS Local LS

New Rules Answers

Client Client Client ClientClient

QueriesAnswersNew Rules

Updates, Inserts

Page 18: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

• Processing Lookup Requests– Local Lookup-Servers store meta-data in RDBMS– Translate Lookup request into SQL

• Registering new services– Public Lookup-Servers store meta-data in RDBMS– Public Lookup-Servers store rules in RDBMS– Apply filter algorithm using RDBMS in order to find

relevant local Lookup-Servers

• Deletes and updates of services– Apply filter algorithm to find affected local Lookup-

Servers (more complicated, however)

• Principle: Map everything to RDBMS

Page 19: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

<person, id = 4711>

<name> Lilly Potter </name>

<child> <person, id = 314>

<name> Harry Potter </name>

</child>

</person>

<person, id = 666>

<name> James Potter </name>

<child> 314 </child>

</person>

person person

Harry Potter

name

name name

person

Lilly Potter James Potter

child

314

0

4711 666

i314

Storing XML Data in an RDBMS

Page 20: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Edge Approach

Source Label Target

0 person 4711

0 person 666

4711 name v1

4711 child i314

666 name v2

666 child i314

Id Value

v1 Lilly Potter

v2 James Potter

v3 Harry Potter

Id Value

v4 12

Edge Table Value Table (String)

Value Table (Integer)

Page 21: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

XML Queries

• Find the name of all persons that like to play Quidditch and are younger than 18 years

select $nwhere <person>

<name> $n </name><age> $a </age><hobby> Quidditch </hobby>

</person>, $a < 18

• Carry out pattern matching with „document graph“

Page 22: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Translation to SQL

SELECT nv.value FROM Edge p, Edge n, Edge h, Value nv, Value hvWHERE p.label = „person“ AND p.target = n.source AND

n.label = „name“ AND n.target = nv.id AND

p.target = h.source AND h.label = „hobby“ AND h.target = hv.id AND hv.value = „Quidditch“;

Works essentially in the same way for the query language of ourLookup service.

Page 23: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Publish & Subscribe Algorithm

• Decompose subscription rules and store them in RDMBS of Public Lookup-Servers

• SQL Join-Queries in order to match sub-rules with meta-data objects(Recall: meta-data is decomposed, too)

• SQL Join-Queries in order to re-construct matching subscription rules from sub-rules

Page 24: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Decomposition of Subscription Rules

• Data Providers for Stock Market Information that cost less than 500 Dollars:search DataProvider dwhere d.theme.name = „Stock Market“ and d.cost < 500

• Decomposition into three atomic rules:R1: search Theme t where t.name = „Börse“R2: search DataProvider d where d.cost < 500R3: search R1 a, R2 b where b.theme = a

• Store these rules in RDBMSRule Class Operator Attribute Value

R1 Theme = name Stock Mkt

R2 DataProv. < cost 500

Page 25: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

MatchingRule Class Operator Attribute Value

R1 Theme = name Stock Mkt

R2 DataProv. < cost 500

Object Type Attribute Value

O1 Theme name Stock Mkt

O1 Theme description SE InfoSys

O2 DataProv. theme O1

O2 DataProv. attr O3 (kurs)

O2 DataProv. attr O4 (wkn)

O2 DataProv. cost 70

Result of Join: (R1, O1); (R2, O2)

Page 26: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Re-constructing Subscription Rulesfrom matching atomic sub-rules

• Store decomposition graph in RDMBS– higher-level and atomic rules are vertices– Top-level rules are so-called triggering rules;

if they are affected, notify LLS

• Walk „bottom up“ through decomposition graph– SQL-Join Query: for each pair of matching rules, find

out whether they have a common parent– N.B. the decomposition graph is a binary directed,

acyclic graph

Page 27: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Preliminary Experiments

• Synthetic benchmark database with 100.000 (different) subscription rules

• Oracle 8i used in the Public Lookup Server

# new providers Proc. Time (PLS)

1 250 msecs

100 (batch) 5000 msecs

Batch updates are crucial

Page 28: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Summary

• Basic Principle: decompose rules and data

• Advantages:– Generic, independent of schema– Very easy to implement, no administration needed– Exploit query capabilities of RDBMS– Need not worry about document boundaries– Finding common sub-rules is trivial

• Disadvantage:– Sub-optimal query performance (many Joins)

but probably sufficient, if updates are batched

Page 29: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Related Work

• Lookup Services: Jini, UDDI, Plug & Play

• Publish & Subscribe:– IR world– SIFT (Stanford)– XFilter (Berkeley)– LeSelect (INRIA)– Continuous Queries (Niagra, ...)

• Storing and Indexing XML Data: ...

Page 30: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Outline

• Background

• The ObjectGlobe Lookup Service

• (Security Aspects)

• QoS Management

• Summary

Page 31: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Security Requirements in ObjectGlobe

• Protection of Data and Cycle Providers• Secure Communication

– use SSL connections (authenticated and encrypted)• Authentication of Clients

– passwords / certificates – digitally signed requests (query subplans)

• Authorization control– data/cycle providers are autonomous– but register user privileges in lookup service

Page 32: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Security of Data/Cycle Providers

ObjectGlobe

runtime

system

Class

loader

Class

loader

Class

loader

Internal

class

loader

Secure

sandbox

Internet

Query 1

Query 2

Query 3

Page 33: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Privileged Built-inOperatorsfor Disk or Network Access

sandbox

externaloperator

Internaloperator

tmpfile

Page 34: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

QoS Management

• State of the Art: best-effort

• Goal: users should be able to constrain– Cost of execution– Running time – Quality of the results

• Initial approach (to get a feeling)– extended query optimization– Admission control– Monitoring and plan adaptions at execution time

• Real solution: ???

Page 35: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Quality Parameters

• Cost of execution– $

• Running time– First tuple, last tuple, Nth tuple

• Quality of the results– Number of results– Coverage: Number (or %) of data sources queried– Staleness of data

• Cost as a function of coverage (-> Mariposa)

• Cost as a function of #wheels (Mercedes)

Page 36: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Quality of Service-Parameters

Completeness

Cost

(€

)

min

max

max

Respo

nse

time

Desired

space for

query plans

Page 37: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Extended Query Optimization

Bottom-up dynamic programming query optimizer,standard costing etc., and the following extensions1. Generate alternatives for each operator

• Consider classes of equivalent providers

2. Extended Pruning, Heuristics for choosing a Winner

3. Enumerate „incomplete“ UNIONs4. Initialize QoS-Accounts

Page 38: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Query Optimization: Quality of Service-Considerations

Completeness

Cost

40%

illegal QEP

P

Q

R

Page 39: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

QoS-Annotated Query Plan

scan scan

thumbnail

display

host=A.com

host=client

host=client

host=A.com

host=B.com

host=B.com

wrap_Shost=A.com

host=B.comcost timeOtimeN

cost timeOtimeN

cost timeOtimeN

QoS Accounts

Page 40: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Optimization: Open Questions

• Revisit heuristics to choose winning plan– Dynamic heuristics depending on workload

and/or feedback

• Reverse engineering a plan– How much data should a plan read if the cost

should be $5.00?

• Does query optimization matter?

Page 41: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Admission Control & Monitoring

• Admission Control:– Check assumptions of optimizer – Carried out at plan instantiation time for each

plan fragment (set of operators at one site)

• Monitoring:– Predict quality of results at the end of execution– Carried out by special Monitoring operators

• Take actions if violations are detected– ECA rules specify actions

Page 42: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Monitoring Operators

• at the end of pipelines• are non-blocking / low cost • above „receive“ ops• keep statistics for predictions• differentiate between „open“

and „next“ phase• Communicate with each

other for liveliness monitor

A

send

monitor

B

send

monitor

receive

Join

Page 43: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Plan Adaptions• General: Abort, Restart / Reoptimize

• Response Time Violation:– compressConnection – movePlan (w/wo state)– increasePriority– removeTempResults, ...

• Coverage / Result Quality Violation:– addSubPlan

• Cost Violation:– movePlan, decreasePriority, ...

Page 44: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

ECA Rules for Adaptions

if cost is high and coverage is low then abort

if cost is high and coverage is high then delResults

if rt is high and cost is low and network is critical then compress

Page 45: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Plan Adaptions: Open Questions

• What is the right mix of actions?

• What are the right thresholds for the rules?

• How to avoid the „Schweinezyklus“?

• How to draw the right conclusions from the statistics produced by Monitoring?

• What is the right granularity of actions?Plan vs. Operator vs. Tuple

Page 46: ObjectGlobe  Open, Secure, and QoS-enhanced Distributed Query Processing

Project Status

• First demo presented at SIGMOD 99– Travel information– Four Web data sources (hotels, sights, train conns)– One function provider (travel routes, top N)– Three cycle providers (two in Europe, one in US)

• Online-Demo: http://db.fmi.uni-passau.de/projects/OG

• Current work: more experiments– Problem: getting data from Web sources is sloooow