don’t scrap it, wrap it! a wrapper architecture for legacy data sources mary tork roth peter...
TRANSCRIPT
Don’t Scrap It, Wrap It!A Wrapper Architecture for Legacy Data
Sources
Mary Tork RothPeter SchwarzIBM Almaden
Road Map
• Motivation• Garlic Overview• Wrapper Architecture
– Data Definition– Query Planning– Query Execution
• Good, Bad, and Ugly
Motivation
• “Real Companies”• Heavy investment in legacy
– Data management wares– Application woes
• Need an integrated view of heterogeneous data sources– Leverage existing query facilities– Work around idiosyncrasies
Garlic Architecture
Query Processor
GarlicMetadata
RelationalDB
ObjectDB
ImageArchive
ComplexObjects
Client ClientClient
Wrapper Wrapper Wrapper Wrapper
Wrapper Goals
• Small start-up cost– Wizards are not the only ones writing
• Incremental growth– Wrappers must be able to evolve– Add new sources without disturbing
existing ones
• Must be able to optimize queries– Enable participation, not delegation
Wrapper Overview
Data Source
Wrapper
GarlicObjects
Method Invocation
Planning
Work Request
WrapperPlan
Query Plan
Execution
Execution Plan
Iterator
Modeling Data
• Object Data Model– Interface and Implementation– GDL variant of ODMG-ODL
• Wrapper assigns IDs to objects– OID = IID + key
• Methods– default accessor methods– stub and generic dispatch
Modeling Data Example
interface Country {attribute string name;attribute string airlines_served;attribute boolean visa_required;attribute Image scene;
}
interface Image {attribute readonly string file_name;double matches(in string file_name);void display(in string device_name);
}
Query Planning• Like System-R, bottom-up dynamic
programming• Wrapper tells what it can do
through methods– plan_access() for single collections– plan_join() for multi-way joins– plan_bind() for inner streams of joins
• Input: work request• Output: set of plans, cost,
cardinalities?
Single Collections
• Work Request– Attributes to project upon– Selections, and methods to invoke
• Wrapper response– Which projections, selections it supports– Cost of plan– Instances of Wrapper_Plan class– Include private data for plan execution– Execute a plan which subsumes
request?
Single Collection Access Plan
select H.name, H.city, H.daily_ratefrom Hotels Hwhere H.class = 5 and H.loc = ‘beach’
Garlic Optimizer
Web Wrapper
Hotel Repository
Work Request
Project: H.OID, H.name, H.city H.daily_rate, H.class, H.loc
Preds: H.class = 5 H.loc = ‘beach’
Wrapper Access Plan - Wrapper_Plan class
PropertiesProject: H.OID, H.name, H.city,H.daily_rate, H.class, H.loc
Preds: H.class = 5
Cost: <access cost>
Plan details (private)
Join Plans
• Request– Plans to join– Join Predicate
• Wrapper response– Join plan with supported predicates– Cost of join
Join Plansselect I.namefrom Countries C, Cities Iwhere C.name = ‘Greece’ and I.pop < 500 and I.country=C.OID
Garlic Optimizer
Wrapper Join Plan- Countries, Cities
Project: C.OID, C.name, I.OID, I.name, I.pop, I.country
Preds: C.name = `Greece’, I.pop < 500, I.country = C.OID
Cost: <join cost>
Plan details (private)
Wrapper Access Plan
Work Request
Project: C.OID, C.namePreds: C.name = ‘Greece’Cost: <xx>
Plan details (private)
Wrapper Access PlanProject: I.OID, I.name...
Preds: I.pop < 500Cost: <xx>
Private details (private)
Input Plans
Join pred: I.country = C.OID
Relational Wrapper
Relational DB
Inter Site Joinsselect C.pop, H.namefrom Cities C, Hotels Hwhere C.name = H.loc
Site A: Cities - CSite B: Hotels - H
A B
Garlic
H
H C
A B
Garlic
H C
H C
A B
Garlic
Hsub
Hsub.loc
Hsub C
Bind Plans
• Inter wrapper join• Fetch matches
– Values produced by outer node– Inner node invoked for each/set of
values– Like semi or filter join
• Same request and reply pairs
Query Execution
• Garlic plan looks like tree with wrapper plans as leaves
• Wrapper exports iterator interface– Translate plan into iterator– Methods supported
• reset()• advance()• bind()
Wrapper Details• Interface files include the GDL• Environment files include
parameters specific to wrappers• Libraries
– Core, shared among several wrappers– Implementation, specific to repositories
• Dynamically loaded code• Same address space as Garlic
Odds and Ends
• How easy is it to write a wrapper?– Summer student, chemist, and many
wrappers written.
• Related Work– TSIMMIS
• Uses QDTL, a declarative spec for supported queries
– DISCO• Language for describing capabilites• Partial queries
Good and Bad• Good
– Leverages existing query facilities– Handles idiosyncrasies– Graceful growth and evolution
• Bad– How easy is it to write wrappers?– How unstructured can my repository
be?– Optimization
• Centralized vs. Local• Selectivity estimation?
The Ugly
• Cost model for diverse set of sources
• Handling failures– Unavailable sources– Wrappers are buggy and often wrong– Want graceful degradation on failures
• Replication