don’t scrap it, wrap it! a wrapper architecture for legacy data sources mary tork roth peter...

20
Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden

Upload: amber-hill

Post on 24-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden

Don’t Scrap It, Wrap It!A Wrapper Architecture for Legacy Data

Sources

Mary Tork RothPeter SchwarzIBM Almaden

Page 2: Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden

Road Map

• Motivation• Garlic Overview• Wrapper Architecture

– Data Definition– Query Planning– Query Execution

• Good, Bad, and Ugly

Page 3: Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden

Motivation

• “Real Companies”• Heavy investment in legacy

– Data management wares– Application woes

• Need an integrated view of heterogeneous data sources– Leverage existing query facilities– Work around idiosyncrasies

Page 4: Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden

Garlic Architecture

Query Processor

GarlicMetadata

RelationalDB

ObjectDB

ImageArchive

ComplexObjects

Client ClientClient

Wrapper Wrapper Wrapper Wrapper

Page 5: Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden

Wrapper Goals

• Small start-up cost– Wizards are not the only ones writing

• Incremental growth– Wrappers must be able to evolve– Add new sources without disturbing

existing ones

• Must be able to optimize queries– Enable participation, not delegation

Page 6: Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden

Wrapper Overview

Data Source

Wrapper

GarlicObjects

Method Invocation

Planning

Work Request

WrapperPlan

Query Plan

Execution

Execution Plan

Iterator

Page 7: Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden

Modeling Data

• Object Data Model– Interface and Implementation– GDL variant of ODMG-ODL

• Wrapper assigns IDs to objects– OID = IID + key

• Methods– default accessor methods– stub and generic dispatch

Page 8: Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden

Modeling Data Example

interface Country {attribute string name;attribute string airlines_served;attribute boolean visa_required;attribute Image scene;

}

interface Image {attribute readonly string file_name;double matches(in string file_name);void display(in string device_name);

}

Page 9: Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden

Query Planning• Like System-R, bottom-up dynamic

programming• Wrapper tells what it can do

through methods– plan_access() for single collections– plan_join() for multi-way joins– plan_bind() for inner streams of joins

• Input: work request• Output: set of plans, cost,

cardinalities?

Page 10: Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden

Single Collections

• Work Request– Attributes to project upon– Selections, and methods to invoke

• Wrapper response– Which projections, selections it supports– Cost of plan– Instances of Wrapper_Plan class– Include private data for plan execution– Execute a plan which subsumes

request?

Page 11: Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden

Single Collection Access Plan

select H.name, H.city, H.daily_ratefrom Hotels Hwhere H.class = 5 and H.loc = ‘beach’

Garlic Optimizer

Web Wrapper

Hotel Repository

Work Request

Project: H.OID, H.name, H.city H.daily_rate, H.class, H.loc

Preds: H.class = 5 H.loc = ‘beach’

Wrapper Access Plan - Wrapper_Plan class

PropertiesProject: H.OID, H.name, H.city,H.daily_rate, H.class, H.loc

Preds: H.class = 5

Cost: <access cost>

Plan details (private)

Page 12: Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden

Join Plans

• Request– Plans to join– Join Predicate

• Wrapper response– Join plan with supported predicates– Cost of join

Page 13: Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden

Join Plansselect I.namefrom Countries C, Cities Iwhere C.name = ‘Greece’ and I.pop < 500 and I.country=C.OID

Garlic Optimizer

Wrapper Join Plan- Countries, Cities

Project: C.OID, C.name, I.OID, I.name, I.pop, I.country

Preds: C.name = `Greece’, I.pop < 500, I.country = C.OID

Cost: <join cost>

Plan details (private)

Wrapper Access Plan

Work Request

Project: C.OID, C.namePreds: C.name = ‘Greece’Cost: <xx>

Plan details (private)

Wrapper Access PlanProject: I.OID, I.name...

Preds: I.pop < 500Cost: <xx>

Private details (private)

Input Plans

Join pred: I.country = C.OID

Relational Wrapper

Relational DB

Page 14: Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden

Inter Site Joinsselect C.pop, H.namefrom Cities C, Hotels Hwhere C.name = H.loc

Site A: Cities - CSite B: Hotels - H

A B

Garlic

H

H C

A B

Garlic

H C

H C

A B

Garlic

Hsub

Hsub.loc

Hsub C

Page 15: Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden

Bind Plans

• Inter wrapper join• Fetch matches

– Values produced by outer node– Inner node invoked for each/set of

values– Like semi or filter join

• Same request and reply pairs

Page 16: Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden

Query Execution

• Garlic plan looks like tree with wrapper plans as leaves

• Wrapper exports iterator interface– Translate plan into iterator– Methods supported

• reset()• advance()• bind()

Page 17: Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden

Wrapper Details• Interface files include the GDL• Environment files include

parameters specific to wrappers• Libraries

– Core, shared among several wrappers– Implementation, specific to repositories

• Dynamically loaded code• Same address space as Garlic

Page 18: Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden

Odds and Ends

• How easy is it to write a wrapper?– Summer student, chemist, and many

wrappers written.

• Related Work– TSIMMIS

• Uses QDTL, a declarative spec for supported queries

– DISCO• Language for describing capabilites• Partial queries

Page 19: Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden

Good and Bad• Good

– Leverages existing query facilities– Handles idiosyncrasies– Graceful growth and evolution

• Bad– How easy is it to write wrappers?– How unstructured can my repository

be?– Optimization

• Centralized vs. Local• Selectivity estimation?

Page 20: Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden

The Ugly

• Cost model for diverse set of sources

• Handling failures– Unavailable sources– Wrappers are buggy and often wrong– Want graceful degradation on failures

• Replication