scaling heterogeneous databases and design of disco anthony tomasic louiqa raschid patrick valduriez...
Post on 05-Jan-2016
216 Views
Preview:
TRANSCRIPT
Scaling Heterogeneous Databases and Design of
DISCO
Anthony TomasicLouiqa Raschid
Patrick Valduriez
Presented by:
Nazia KhatirTexas A&M University
Distributed Information Search COmponent (DISCO)
The distributed mediator architecture of DISCO
Query processing semanticsData modelsThe interface to underlying data sources
Introduction
Access to large number of data sources of heterogeneous distributed databases introduces new problems:
End users and application programmers
Unavailable data sources • To answer a query involving n databases, all n
databases must be available, otherwise either no answer is returned, or some partial answer is returned
The availability of answers in the system declines as the number of databases rises.
Introduction (Cont.)
Access to large number of data sources of heterogeneous distributed databases introduces new problems:
Database Administrators (DBA) Incorporating new sources into the model
• Schemas must be changed• Catalogs must be updated• New definitions must be added
Introduction (Cont.)
Access to large number of data sources of heterogeneous distributed databases introduces new problems:
Database Implementors (DBI)Translation of queries between query languages
and schemas• New codes must be written
DISCO ArchitectureDISCO Architecture
A : Application
M : Mediator
C : Catalog
W : Wrapper
D : Data Source
Arcs represent exchange of queries and answers
Applications (A)
Written by application programmers Access a uniform representation of the
underlying sources through a uniform query language
Mediators (M)
Permit collection of databases to be accessed in a uniform way
Accept queries and transform them into sub-queries
Keep state of summary information about its associated databases
Catalogs (C)
Special mediatorsKeep track of collection of databases,
wrappers, and mediatorsOverview of the entire system
Wrappers (W)
Deal with the heterogeneous nature of databases
Transform sub-queriesMaps from the general query language,
used by mediators, to the source query language
Reform answer (data) appropriate to each mediator
Features of DISCO
For application Programmers Provides a new semantic for query processing
to ease dealing with unavailable data sources
Features of DISCO
For DBA Models data sources as objects which permits
powerful modeling capability Supports type transformations to ease the
incorporation of new data sources into a mediator
Features of DISCO
For DBI Provides flexible wrapper interface to ease the
construction of wrappers
Data Model
person (type)
person (extent)
person0 person1 person2
r0
Mary 200
r2
Select x.nameFrom x in personWhere x.salary > 10
The answer is:bag (“Mary”, “Sam”) of Bag type.
• (Programmer viewpoint) The same query would access the third data source as well.• (DBA viewpoint) The model supports dissimilar structures
r1
Sam 150
Wrapper Interface DISCO provides a flexible wrapper interface for DBI.
The interface to wrappers is at the level of an abstract algebraic machine (AM) of logical operators. DBI implements the logical operators and a call in the wrapper interface which returns the grammar.
• During the query processing, mediator generates a logical expression.• Mediator call interface to get the grammar and checks the logical expression matches the grammar
Mediator
Wrapper
Interface (Algebraic Machine)
Mediator Data ModelExtensions to the ODMG standard
ODMG (Object Data Management Group) Object Data Model Object Definition Language (ODL) Object Query language (OQL) Language binding
Mediator Data ModelExtensions to the ODMG standard
Object Data Model interface defines a type signature for an objectextent automatically maintain the collection of
objects of the interface, i.e. an extent is a name variable whose value is the collection of all objects of the associated interface. When objects are created or destroyed, the extent is updated automatically.
Mediator Data ModelExtensions to the ODMG standard
Object Definition Language (ODL)wrapper models wrappers repository the address of a database or some
other type of repository, contain several data sources. Each data source in a repository is associated with an extent
Mediator Data ModelExtensions to the ODMG standard
Define access to a data source1. Create an instance of the repository type:
r0 := Repository (host = “rodin.inria.fr”, name = “db”, address = “123.45.6.7”)
2. Locate the wrapper (written by a database implementor): w0 := WrapperPostgres ( );
3. Define the interface (type) in the mediator which corresponds to the data source object, e.g. Person type corresponds to the objects in data sources r0 and r1:
interface Person { attribute String name; attribute Short salary; }
4. Specify the extent of this mediator type which access the r0 utilizing the w0 wrapper.
extent person0 of Person wrapper w0 repository r0;
Each DISCO extent represents a collection of data in one data source
Mediator Data ModelExtensions to the ODMG standard
Data access from the data sourceThe queryselect x.name
from x in person0where x.salary > 10
returns the answer Bag(“Mary”)
Addition of a new extent of Person type:extent person1 of Person wrapper w0 repository r1;
To access objects in both data sources the query:select x.namefrom x in union (person0, person1)where x.salary > 10
returns the answer Bag(“Mary”, “Sam”)
Advantage: refer to the extents explicitly Disadvantage: difficult to express queries, when the extents are not explicitly specified
Mediator Data ModelExtensions to the ODMG standard
Solution: MetaExtent keeps details the extents of all the mediator types. General format of MetaExtent type that is created automatically:
interface MetaExtent (extent metaextent) {
attribute String name; attribute Extent e; attribute Type interface; attribute Wrapper wrapper; attribute Repository repository; attribute Map map;
}
Query definition expression of the extent person:interface Person (extent person) { attribute String name; attribute Short salary; }
Thus, the query dynamically accesses all the extents defined for the type Person
define person asFlatten( select x.e from x in metaextent where x.interface = Person)
Mediator Data ModelMatching similar and dissimilar structures or substructures
DBA defines the aggregation of data from data sources access to multiple data sources:
Matching similar substructures subtypeMatching similar structures mapMatching dissimilar structures view
Mediator Data ModelMatching similar substructures
Subtyping ODMG standard
Example:The Student interface as a subtype of Person and two extents are defined by DBA as follows: interface Student: Person { } extent student0 of Student wrapper w0 repository r2 extent student1 of Student wrapper w0 repository r3The person extent still contains person0, and person1. It does not automatically reference the extents of its subtypes, in the subtype hierarchy.DISCO Solution: special syntax person*
Mediator Data ModelMatching similar structures
Mapping Example:
interface PersonPrime { attribute String n; attribute Short s; }extent personprime0 of PersonPrime wrapper w0 repository r0;
Since objects returned from r0 are of type Person, the extent personprime0 has a type conflict with objects returned. To avoid a run-time error DISCO allows the DBA to resolve this type conflict.
Mediator Data ModelMatching similar structures
Mapping example (Cont.):The type conflict is resolved by specifying a mapping between a mediator type and a data source type. The mapping function is called the local transformation map.
extent personprime0 of
PersonPrime wrapper w0 repository r0
map ((person0=personprime0),
(name = n), (salary = s));
extent personprime0 of PersonPrime wrapper w0 repository r0;
Mediator Data ModelMatching dissimilar structures
View in DISCOExample:interface PersonTwo {
attribute String name; attribute Short regular; attribute Short consult; }
extent persontwo0 of PersonTwo wrapper w0 repository r5;
View definition to aggregate over the data sources:define personnew as bag (select struct (name : x.name, salary : x.salary) from x in person, select struct (name : x.name, salary : x.regular + x. consult) from x in persontwo0)
A view can reference other views but are not updatable
Mediator Query Processing
Query Processing With Unavailable Data There are three possibilities if a data source does not
respond:1- System waits
2- System assumes the unavailable source do not exist or the source is considered to have no matching tuples
3- System returns a partial answer DISCO uses partial evaluation semantics to queries, by
processing as much of the query as possible, from the information that is available. Thus, the answer to a query may be another query.
Assume r0 does not respond:
select x.name
from x in person
where x.salary > 10
Query Processing With Unavailable Data (Cont.)
Query Processing With Unavailable Data (Cont.)
Assume r0 does not respond:
select x.name
from x in person
where x.salary > 10
union (select y.name from y in person0 where y.salary > 10, Bag(“sam”))
Query Processing With Unavailable Data (Cont.)
Assume r0 does not respond:
select x.name
from x in person
where x.salary > 10
union (select y.name from y in person0 where y.salary > 10, Bag(“sam”))
partial answer (query)
partial answer (data)
Conclusion
The design of DISCO provides some solutions to some of the problems encountered by the scaling the number of data sources in heterogeneous distributed databases. Partial evaluation query semantics AP Data modeling tools DBA Flexible wrapper interface DBI
top related