navigational plans for data integration marc friedman alon levy todd millistein presented by avinash...

28
Navigational Plans For Navigational Plans For Data Integration Data Integration Marc Friedman Alon Levy Marc Friedman Alon Levy Todd Millistein Todd Millistein Presented By Presented By Avinash Ponnala Avinash Ponnala

Upload: thomas-jacob-ray

Post on 31-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Navigational Plans For Data Navigational Plans For Data IntegrationIntegration

Marc Friedman Alon Levy Todd Marc Friedman Alon Levy Todd Millistein Millistein

Presented By Presented By

Avinash PonnalaAvinash Ponnala

IntroductionIntroduction

.Data Integration with webs of data as .Data Integration with webs of data as sources.sources.

.Previous works are inappropriate for .Previous works are inappropriate for incorporating data webs as sources in incorporating data webs as sources in Data Integration.Data Integration.

.Data Integration systems posses many .Data Integration systems posses many hard technical problems.hard technical problems.

.Due to growing number of sources ,they .Due to growing number of sources ,they should be modeled as webs of data.should be modeled as webs of data.

GOALGOAL

• A Procedure for modeling data webs i.e A Procedure for modeling data webs i.e incorporating them into a Data incorporating them into a Data Integration system.Integration system.

• GLAV language for source description.GLAV language for source description.

• An algorithm for reformulating user An algorithm for reformulating user queries into executional plans that both queries into executional plans that both query and navigate the data sources.query and navigate the data sources.

Incorporating Data WebsIncorporating Data Webs• A Data web consists of pages and links between A Data web consists of pages and links between

them.them.

• The structure of a Data Web is represented with The structure of a Data Web is represented with a Web Schema.a Web Schema.

• In a Web Schema In a Web Schema

Nodes Sets of pages Nodes Sets of pages Directed Edges Sets of Directed Edges Sets of directed directed

links between themlinks between them

Example of a Web SchemaExample of a Web Schema

• Univ represent the home page of the Univ represent the home page of the university.university.

• Univ(u1) denotes the home page object Univ(u1) denotes the home page object of university u1.of university u1.

• Every websites has a set of entry points Every websites has a set of entry points i.e. nodes.i.e. nodes.

• The Data Integration System can access The Data Integration System can access directly by URL using entry points.directly by URL using entry points.

• There are three kinds of logical information There are three kinds of logical information stored on each page:- stored on each page:- 1) Ordinary contents of the page. 1) Ordinary contents of the page. p(Y1,Y2……Yk) p(Y1,Y2……Yk) 2) Outgoing 2) Outgoing edges from the page. edges from the page. P(x,y) --> M(Y) P(x,y) --> M(Y) 3) Search forms on the page. 3) Search forms on the page. p(x,y )-----> M(Y). p(x,y )-----> M(Y).

• Search forms map binary relations to other Search forms map binary relations to other pages.pages.

form

Mediated SchemasMediated Schemas

• It is a set of relations which serves as uniform It is a set of relations which serves as uniform query interface for all sources.query interface for all sources.

• Here is the example of mediated schema for Here is the example of mediated schema for our university Domain our university Domain collegeOf(College,University) collegeOf(College,University) depfOf(Department,College) depfOf(Department,College) profOf(Proffesor,Department) profOf(Proffesor,Department) courseOf(Course,Department) courseOf(Course,Department) chairOf(Proffesor,Department) chairOf(Proffesor,Department) prereqOf(Course,Course) prereqOf(Course,Course)

• The user posses queries in terms of The user posses queries in terms of relations and attributes of a relations and attributes of a mediated database schema.mediated database schema.

• The relations in the mediated The relations in the mediated schema are virtual.schema are virtual.

• The mediated schema captures the The mediated schema captures the aspects of the domain of interest to aspects of the domain of interest to the users of the application.the users of the application.

Source DescriptionsSource Descriptions

• Why Source Descriptions?Why Source Descriptions?

• Sample Source Description Sample Source Description

• The mediated schema relations do not match the The mediated schema relations do not match the source relations in one-one fashion because source relations in one-one fashion because 1) Source schema 1) Source schema contains different levels of detail from each other. contains different levels of detail from each other. 2) Splitting of 2) Splitting of attributes into relations is different. attributes into relations is different.

• In addition to mediated schema ,the system has a In addition to mediated schema ,the system has a set of source descriptions that specify a semantic set of source descriptions that specify a semantic mapping between the mediated schema and the mapping between the mediated schema and the source schema. source schema.

• The problem of mismatch can be solved by GAV The problem of mismatch can be solved by GAV and LAV source description languages.and LAV source description languages.

• The LAV source description have the formThe LAV source description have the form

v(X)= r1(X1,Z1) ^…….. ^rk(Xk,Zk)v(X)= r1(X1,Z1) ^…….. ^rk(Xk,Zk)

where v---Source Relation where v---Source Relation

ri’s---mediated schema ri’s---mediated schema

relations relations LAV contains details that are not presented in every source.LAV contains details that are not presented in every source.

_ _ _ _ _

• GAV source description have the form GAV source description have the form

• _ _ _ _ __ _ _ _ _ V1(X1,Y1)^….. ^Vj(Xj,Yj)=>r(X) V1(X1,Y1)^….. ^Vj(Xj,Yj)=>r(X)

• There are undesirable consequences of using There are undesirable consequences of using

the either one. the either one. • There is also no flexibility.There is also no flexibility.• GLAV combines the expressive power of both GLAV combines the expressive power of both

GAV and LAV.GAV and LAV.

• The GLAV source description has the The GLAV source description has the form form

_ _ _ _ _ _ _ _ _ _ _ _

V(X,Y) => r1(X1,Z1) ^….. ^rk(Xk,Zk).V(X,Y) => r1(X1,Z1) ^….. ^rk(Xk,Zk).

• It allows source descriptions that It allows source descriptions that contain recursive queries over contain recursive queries over sources.sources.

Data Integration DomainData Integration Domain

• The combination of set of source The combination of set of source descriptions and set of web schemas form descriptions and set of web schemas form Data integration Domain.Data integration Domain.

• It can be denoted as It can be denoted as D= triple(R,{Gi},SD) where D= triple(R,{Gi},SD) where

R--> Set of mediated schema relations R--> Set of mediated schema relations Gi--> Web Schemas Gi--> Web Schemas SD--> Source Descriptions. SD--> Source Descriptions.

How to answer a Query?How to answer a Query?

• Using a query processor.Using a query processor.

• The user query is translated into a lower level The user query is translated into a lower level procedural program called an executional plan.procedural program called an executional plan.

• A logical plan is constructed first .A logical plan is constructed first .

• A navigational plan is formed later by A navigational plan is formed later by augmenting logical plan with navigational augmenting logical plan with navigational informationinformation

• A Navigational plan describes how to locate the A Navigational plan describes how to locate the desired relations in the data webs.desired relations in the data webs.

Logical PlanLogical Plan

• A Logical Plan is a Datlog Program whose EDB A Logical Plan is a Datlog Program whose EDB relations are the source relations and whose answer relations are the source relations and whose answer predicate is q.predicate is q.

• The result of applying a Datlog program to a data The result of applying a Datlog program to a data base is the set of tuples computed for a query base is the set of tuples computed for a query predicate.predicate.

• If a conjunctive query Q is given , a sound and If a conjunctive query Q is given , a sound and complete logical plan is constructed for a query using complete logical plan is constructed for a query using an inverse rules algorithm for GLAV called as an inverse rules algorithm for GLAV called as GlavInverse.GlavInverse.

• Let ‘T’ contains the sentences in the source Let ‘T’ contains the sentences in the source description, then the GlavInverse converts the theory description, then the GlavInverse converts the theory T into a Datlog program.T into a Datlog program.

GalvInverse AlgorithmGalvInverse Algorithm

• Theorem: Theorem: Let D=(R, Let D=(R,{Gi},SD) be an information {Gi},SD) be an information integration domain. Let ‘Q’ be a integration domain. Let ‘Q’ be a conjunctive query. Then the logical conjunctive query. Then the logical plan ‘▲’ returned by GlavInverse is plan ‘▲’ returned by GlavInverse is sound and complete. sound and complete.

Navigational PlanNavigational Plan

• Logical plans do not explain how to populate Logical plans do not explain how to populate the source relations from data webs. So the source relations from data webs. So they cannot be executed by themselves.they cannot be executed by themselves.

• Logical plans are extended to navigational Logical plans are extended to navigational plans.plans.

• Navigational plans are augmented datlog Navigational plans are augmented datlog programs.programs.

• Navigational terms specify both the location Navigational terms specify both the location and the logical content of the relation stored and the logical content of the relation stored in the data web.in the data web.

• The navigational term is of the form The navigational term is of the form P:v(x), where P is the path and v is P:v(x), where P is the path and v is the source relation.the source relation.

• The path ‘P’ starts at source(P) and ends The path ‘P’ starts at source(P) and ends at target(P) .at target(P) .

• Trivial paths: If P=[N(X)] Trivial paths: If P=[N(X)] Where N--- Where N---node , X—variable or constant. node , X—variable or constant. Source(P) = target(P) = N(X). Source(P) = target(P) = N(X).

• Compound paths: Compound paths: P = [P-- P = [P--M(Y)] M(Y)] is a path is a path If P is a path with target(P) = N(X) If P is a path with target(P) = N(X) e is an edge from node e is an edge from node N(X) to node M(Y) then, source(P`) N(X) to node M(Y) then, source(P`) = source(P) and = source(P) and target(P`) = M(Y). target(P`) = M(Y).

e

• Algorithm of Navigational plan Algorithm of Navigational plan produces a Navigational plan ∆′ if produces a Navigational plan ∆′ if logical plan ∆ and web schemas. logical plan ∆ and web schemas.

• The Navigational plan ∆′ produced by The Navigational plan ∆′ produced by Navigational plan is sound and Navigational plan is sound and complete.complete.

ConclusionsConclusions

• How to extend Data Integration systems How to extend Data Integration systems to incorporate data webs is shown.to incorporate data webs is shown.

• A formalism for modeling data webs A formalism for modeling data webs and a language for source descriptions and a language for source descriptions is studied.is studied.

• An algorithm for answering queries An algorithm for answering queries using GLAV source description is using GLAV source description is focused. focused.

QUERIES? QUERIES?

• THANK YOUTHANK YOU