data integration

64
Information Systems Group Candidacy Exam, Jan. 2010 Clio: Schema Mapping Creation and Data Exchange Presented by Leila Jalali

Upload: leila-jalali

Post on 30-Jun-2015

363 views

Category:

Technology


0 download

DESCRIPTION

Clio Schema Mapping

TRANSCRIPT

Page 1: Data Integration

Information Systems Group Candidacy Exam, Jan. 2010

Clio: Schema Mapping Creation and

Data Exchange

Presented by

Leila Jalali

Page 2: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

the Clio project

Source schema S

Target schema T

•Wants data from S•Understands T•May not understand S

Schema Mapping

“conforms to”

data

“conforms to”

QQ

Clio addresses two main problems: How to generate schema mappings generate schema mappings and how to use them for data exchangedata exchange?

Data Exchangeto transform data

Page 3: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

The Motivating Example1.Schema Mapping Generation

Mapping generation algorithm

2. Data Exchange Query generation algorithm

Conclusions

Outline

Page 4: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Schema S:

A Motivating Example

Companies: Set of RcdName

Address

Year

Grants : Set of RcdGid

Recipient

Amount

Supervisor

Manager

Contacts : Set of RcdCid

Email

Phone

Organizations: Set of RcdCode

Year

Fundings: Set of Rcd

FId

FinId

Finances: Set of RcdFinId

Budget

Phone

v1

v2

v3

v4

Schema T:

Correspondences (given by a "schema matcher“ or a“user”)

f1

f2f3

f4

Page 5: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Correspondences

n,d,y Companies(n,d,y) → y',F Organizations(n,y',F))

v1:

Companies

Name

Address

Year

Grants

Gid

Recipient

Amount

Supervisor

Manager

Contacts

Cid

Email

Phone

Organizations

Code

Year

Fundings

FId

FinId

Finances

FinId

Budget

Phone

v1

v2

v3

v4

f1

f2f3

f4

Using tuple generating dependency(tgd):

foreach c in companiesexists o in organizations,

with o.code = c.name

Page 6: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

n,d,y,g,a,s,m Companies(n,d,y),

Grants(g,n,a,s,m) → y',F,f, p

Organizations(n,y',F)), F(g,f), Finances(f,a,p)

Companies

Name

Address

Year

Grants

Gid

Recipient

Amount

Supervisor

Manager

Contacts

Cid

Email

Phone

Organizations

Code

Year

Fundings

FId

FinId

Finances

FinId

Budget

Phone

v1

v2

v3

v4

f1

f2f3

f4

More complex mappings

foreach c in companies, g in grantswhere c.name=g.recipient

exists o in organizations,f in o.fundings,i in financeswhere f.finId = i.finId

with o.code = c.name and f.fId = g.gId and i.budget = g.amount

Page 7: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

n,d,y,g,a,s,m Companies(n,d,y),

Grants(g,n,a,s,m) → y',F,f, p

Organizations(n,y',F)), F(g,f), Finances(f,a,p)

Companies

Name

Address

Year

Grants

Gid

Recipient

Amount

Supervisor

Manager

Contacts

Cid

Email

Phone

Organizations

Code

Year

Fundings

FId

FinId

Finances

FinId

Budget

Phone

v1

v2

v3

v4

f1

f2f3

f4foreach c in companies, g in grants

where c.name=g.recipientexists o in organizations,

f in o.fundings,i in financeswhere f.finId = i.finId

with o.code = c.name and f.fId = g.gId and i.budget = g.amount

query on the source:QS

query on the target: QT

Correspondences

QS QT

More complex mappings

Page 8: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

The Motivating Example1.Schema Mapping Generation

Mapping generation algorithm

2. Data Exchange Query generation algorithm

Conclusions

Outline

Page 9: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Mapping GenerationSource Schema Generate all possible associations within

the SourceTarget Schema

Structural Associations

Generate all possible associations within the Target

Page 10: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Mapping Generation

Companies:NameAddressYear

Grants:GidRecipientAmountSupervisorManager

Contacts:CidEmailPhone

Organizations:CodeYearFundings:

FIdFinId

Finances:FinIdBudgetPhone

f1

f2f3

f4

from g in grants

from p in companies

from o in organizations

Source Schema Generate all possible associations within the Source

Target Schema

Structural Associations

Generate all possible associations within the Target

Page 11: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Mapping GenerationSource Schema Generate all possible associations within

the SourceTarget Schema

Structural Associations

Generate all possible associations within the Target

Build larger associaitons in Source (AS) and Target (AT)

Logical Associations

Page 12: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Mapping Generation

Companies:NameAddressYear

Grants:GidRecipientAmountSupervisorManager

Contacts:CidEmailPhone

f1

f2f3

Source Schema

Target Schema

Structural Associations

AS :

Build larger associaitons in Source (AS) and Target (AT)

Logical Associations

Generate all possible associations within the Source

Generate all possible associations within the Target

starting with a structural association and "chasing" constraints

Page 13: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Mapping GenerationSource Schema

Target Schema

Structural Associations

Logical Associations

Use a pair of <AS,AT > and Correspondeces covered by <AS , AT> to generate a

Clio Mapping: foreach AS exists AT with WW is the conjunction of equalities h (eS )=h’(eT ) (captured from correspondences)

Build larger associaitons in Source (AS) and Target (AT)

Generate all possible associations within the Source

Generate all possible associations within the Target

Page 14: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Clio mapping, example

AS : from g in grants, c in companies, s in contacts, m in contacts

where g.recipient = c.name and g.supervisor = s.cid and g.manager = m.cid

AT: from o in organizations, f in o.fundings, i in finances

where f.finId = i.finId

v1, v2, v3 are covered

Companies

Name

Address

Year

Grants

Gid

Recipient

Amount

Supervisor

Manager

Contacts

Cid

Email

Phone

Organizations

Code

Year

Fundings

FId

FinId

Finances

FinId

Budget

Phone

v1

v2

v3

v4

f1

f2f3

f4

Generate a Clio Mapping: foreach AS exists AT with WW is the conjunction of equalities h (eS )=h’(eT )

foreach g in grants, c in companies, s in contacts, m in contactswhere g.recipient = c.name and g.supervisor = s.cid and g.manager = m.cid

exists o in organizations, f in o.fundings, i in financeswhere f.finId = i.finId

with c.name = o.code and g.gId = f. fId and g.amount = i.budget

Page 15: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Dominance

A2 dominates A1 (A1 ≤ A2 ) if the from and where clauses of A1 are subsets

of those of A2 (after suitable renaming)

A2 : from g in grants, c in companies, s in contacts, m in contactswhere g.recipient = c.name and g.supervisor = s.cid and

g.manager = m.cid

A1 : from g in grants, c in companieswhere g.recipient = c.name

Page 16: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Coverage of a coresspondence A correspondence v : foreach PS exists PT with

eS=eT

is covered by a pair of associations <AS , AT> if PS ≤ AS and PT ≤ AT with some renaming h, h’

Example:

AS : from c in companiesAT : fom o in organizations

v: foreach c in companies exists o in organizations with c.name = o.code

Page 17: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Mapping GenerationSource Schema

Target Schema

Structural Associations

Logical Associations

Build larger associaitons in Source (AS) and Target (AT)

Use a pair of <AS,AT > and Correspondeces covered by <AS , AT> and generate a

Clio Mapping: foreach AS exists AT with WW is the conjunction of equalities h (eS )=h’(eT ) (captured from correspondences)

Generate all possible associations within the Source

Generate all possible associations within the Target

Page 18: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Mapping GenerationSource Schema

Target Schema

Structural Associations

Logical Associations

Add the Clio Mapping to the Set of Mappings

the Set of Mappings

Build larger associaitons in Source (AS) and Target (AT)

Use a pair of <AS,AT > and Correspondeces covered by <AS , AT> and generate a

Clio Mapping: foreach AS exists AT with WW is the conjunction of equalities h (eS )=h’(eT ) (captured from correspondences)

Generate all possible associations within the Source

Generate all possible associations within the Target

Page 19: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Finds maximal sets of correspondences that can be interpreted together

Discard the “larger” mapping

Generate a Clio mapping

Logical associations are meaningful combinations of correspondences

Page 20: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

The Motivating Example1. Schema Mapping Generation

Mapping generation algorithm

2. Data Exchange Query generation algorithm

Conclusions

Outline

Page 21: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Query generation for data exchange

Mapping generation

Query generation

Target schema

Source schema

Page 22: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Overview of Query Generation

Input: A Clio Mapping

1. Query Graph is constructed which represents the key portions of the query in the graph

2. Annotate the graph to generate Skolem terms

3. Traverse the graph and produce the query

Output: the data exchange Query

(in SQL, XQuery, or XSLT)

y 0 (organizations)

y 0.year

y 0 .codey 1(fundings)

y 0.finIdy 0.fid

x1. gid

x 0.name

x 0.name

x1. amount, x1.gid, x 0.name,

x 0.name

x 0.name, x1.gidx1.gid

Page 23: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

y0 (organizations)

Adding a node for each variable in the exists clause

y1(fundings)

y2(finances)

1. Constructing the Query Graph

Page 24: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Adding nodes for all the atomic type elements reachable from these nodes via record projection

Organizations:CodeYearFundings:

FIdFinId

Finances FinId Budget Phone

f4

y0 (organizations)

y1(fundings)

y2(finances)

y1.fid y1.finId

y0.code y0.year y2.finId

y2.budget

y2.phone

1. Constructing the Query Graph (cont.)

Page 25: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

y0 (organizations)

Add structural edges to reflect the relationships between nodes

y1(fundings)

y2(finances)

y1.fid y1.finId

y0.code y0.year y2.finId

y2.budget

y2.phone

1. Constructing the Query Graph (cont.)

Organizations:CodeYearFundings:

FIdFinId

Finances FinId Budget Phone

Page 26: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

y0 (organizations)

Add the source nodes for all source expressions in the with clause

y1(fundings)

y2(finances)

y1.fid y1.finId

y0.code y0.year y2.finId

y2.budget

y2.phone

x1. gid

x0.name

x1.amount

x2.phone

1. Constructing the Query Graph (cont.)

Page 27: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

y0 (organizations)

Attach the source nodes to the target nodes to which they are “equal”

y1(fundings)

y2(finances)

y1.fid y1.finId

y0.code y0.year y2.finId

y2.budget

y2.phone

x1. gid

x0.name

x1.amount

x2.phone

1. Constructing the Query Graph (cont.)

Page 28: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

y0 (organizations)

Use the equalities in the where clause to add edges between target nodes

y1(fundings)

y2(finances)

y1.fid y1.finId

y0.code y0.year y2.finId

y2.budget

y2.phone

x1. gid

x0.name

x1.amount

x2.phone

1. Constructing the Query Graph (cont.)

Page 29: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

x 0.namex 1.amount

x1.gid

2. Annotating the Graph

y0 (organizations)

y1(fundings)

y2(finances)

y1.fid y1.finId

y0.code y0.year y2.finId

y2.budget

y2.phone

x1. gid

x0.name

x1.amount

x2.phone

Each node is annotated with a set of source expressions

Upward propagation: Every expression that a node acquires is propagatedto its parent node, unless the (acquiring) node is a variable.

x 2.phonex 0.name

x1.gid

x 1.amount

x 2.phone

Page 30: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

2. Annotating the Graph (cont.)

Downward propagation: Every expression that a node acquires is propagated to its children

x 0.namex 1.amount

x1.gid

y0 (organizations)

y1(fundings)

y2(finances)

y1.fid y1.finId

y0.code y0.year y2.finId

y2.budget

y2.phone

x1. gid

x0.name

x1.amount

x2.phone

x 2.phonex1.gid

x 0.name x 1.amount, x 2.phonex 1.amount, x 2.phonex 0.namex 0.name

x1.gid

x 0.name

Page 31: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

2. Annotating the Graph (cont.)

Eq. propagation: Every expression that a node acquires is propagated to the nodes related to it through equality edges.

x 0.namex 1.amount

x1.gid

y0 (organizations)

y1(fundings)

y2(finances)

y1.fid y1.finId

y0.code y0.year y2.finId

y2.budget

y2.phone

x1. gid

x0.name

x1.amount

x2.phone

x 2.phonex1.gid,x 0.name

x 0.name x 1.amount, x 2.phone

x 1.amount, x 2.phonex 0.name

x1.gid,x 0.name

x 1.amount, x 2.phone

x1.gid,x 0.name

Page 32: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

2. Annotating the Graph (cont.)

Apply the rules until no more rules can be applied

x 0.namex 1.amount

x1.gid

y0 (organizations)

y1(fundings)

y2(finances)

y1.fid y1.finId

y0.code y0.year y2.finId

y2.budget

y2.phone

x1. gid

x0.name

x1.amount

x2.phone

x 2.phonex1.gid,x 0.name

x 0.name x 1.amount, x 2.phone

x 1.amount, x 2.phonex 0.name

x1.gid,x 0.namex 1.amount, x 2.phone

x1.gid,x 0.namex 1.amount, x 2.phone

x1.gid,x 0.name

Page 33: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

3. Generation of Transformation Queries

The for each clause is converted to a query fragment:

Generate the query fragment:

Page 34: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

3. Generation of Transformation Queries

x 0.namex 1.amount

x1.gid

y0 (organizations)

y1(fundings)

y2(finances)

y1.fid y1.finId

y0.code y0.year y2.finId

y2.budget

y2.phone

x1. gid

x0.name

x1.amount

x2.phone

x 2.phonex1.gid,x 0.name

x 0.name x 1.amount, x 2.phone

x 1.amount, x 2.phonex 0.name

x1.gid,x 0.namex 1.amount, x 2.phone

x1.gid,x 0.namex 1.amount, x 2.phone

x1.gid,x 0.name

Perform a depth-first traversal on the Graph

Page 35: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

x 0.namex 1.amount

x1.gid

y0 (organizations)

y1(fundings)

y2(finances)

y1.fid y1.finId

y0.code y0.year y2.finId

y2.budget

y2.phone

x1. gid

x0.name

x1.amount

x2.phone

x 2.phonex1.gid,x 0.name

x 0.namex 1.amount, x 2.phone

x 1.amount, x 2.phonex 0.name

x1.gid,x 0.namex 1.amount, x 2.phone

x1.gid,x 0.namex 1.amount, x 2.phone

x1.gid,x 0.name

3. Generation of Transformation Queries

Page 36: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Finally we have the Query:

Page 37: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Clio: Conclusion

Providing tools that help in automating and managing the problem of Data Conversion

The key contributions of Clio:Schema mapping generation

Mapping as a query discovery problemCapable of mapping between relational and nested

schemasQuery generation for data exchange

SQL, XQuery, XSLT, generating Skolems,...

Page 38: Data Integration

Information Systems Group Candidacy Exam, Jan. 2010

Thanks

Page 39: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Clio RequirementsComplex mappings: using associationDefinitions:

Mapping languagePathsSchema&TypesDominance

Query Generation Challenges,the problem of Recursion in XML schema

Nested Referential Integrity (NRI) constraintsThe Chase

Back ups

Page 40: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

the Clio project- overview of the requirements

Source schema S

Target schema T

Schema Mapping

“conforms to”

data

“conforms to”

QQ

no assumptions about the schemas

A general mapping language

Capable of mapping between relations schemas and nested schemas

Mapping at different levels of granularities

Incremental mapping algorithms

Page 41: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Formalize correspondences

n,d,y Companies(n,d,y) → y',F Organizations(n,y',F))

v1:

v2:n,d,y,g,a,s,m

Companies(n,d,y),Grants(g,n,a,s,m) → y',F,f Organizations(n,y’,F), F(g,f )

v3: g, r, a, s, m Grants(g,r,a,s,m) → f,p Finances(f,a,p)

v4:c, e, p Contacts(c,e,p) → f,b Finances(f,b,p)

Companies

Name

Address

Year

Grants

Gid

Recipient

Amount

Supervisor

Manager

Contacts

Cid

Email

Phone

Organizations

Code

Year

Fundings

FId

FinId

Finances

FinId

Budget

Phone

v1

v2

v3

v4

f1

f2f3

f4

Using tuple generating dependency(tgd):

Page 42: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Correspondences alone are not enough

f3

CompaniesName Addres

sYear

MS SA 1976AT&T TX 1980IBM NY 1955

GrantsGId Rec.t Amt

301 MS 30

302 MS 40

303 IBM 30

Organizations

FundingsCode

MS

Year

FinIdFId

AT&T

IBM

301

302

Rec.t

How individual data values should be connected in the target?Companies

Name

Address

Year

Grants

Gid

Recipient

Amount

Supervisor

Manager

Contacts

Cid

Email

Phone

Organizations

Code

Year

Fundings

FId

FinId

Finances

FinId

Budget

Phone

v1

v2

v3

v4

f1

f2f3

f4

Page 43: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

The "association" between companies and grants in the source is suggested by f1 (a foreign key)

More complex mappings are needed

n,d,y,g,a,s,m Companies(n,d,y),Grants(g,n,a,s,m) → y',F,f Organizations(n,y’,F), F(g,f )

f3

CompaniesName Addres

sYear

MS SA 1976AT&T TX 1980IBM NY 1955

GrantsGId Rec.t Amt301 MS 30302 MS 40303 IBM 30

Organizations

FundingsCode

MS

Year

301

FinIdFId

AT&T

IBM 303

302

Companies

Name

Address

Year

Grants

Gid

Recipient

Amount

Supervisor

Manager

Contacts

Cid

Email

Phone

Organizations

Code

Year

Fundings

FId

FinId

Finances

FinId

Budget

Phone

v1

v2

v3

v4

f1

f2f3

f4

Page 44: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Yet more complex...

n,d,y,g,a,s,m Companies(n,d,y),Grants(g,n,a,s,m) →

y',F,f, p Organizations(n,y',F), F(g,f), Finances(f,a,p)

• Three tuples are generated for each pair of related companies and grants

• The mapping specifies that there exist an f, appearing in two places, without saying what its value must be

v3:g, r, a, s, m Grants(g,r,a,s,m) →

f,p Finances(f,a,p)

Companies

Name

Address

Year

Grants

Gid

Recipient

Amount

Supervisor

Manager

Contacts

Cid

Email

Phone

Organizations

Code

Year

Fundings

FId

FinId

Finances

FinId

Budget

Phone

v1

v2

v3

v4

f1

f2f3

f4

Page 45: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

v4 c, e, p Contacts(c,e,p) → f,b Finances(f,b,p)

Yet more complex...

• How do we obtain the phone to be put in finances?

• Is it the supervisor's one or the manager's?

• FKs suggest either (or even both)

• Human intervention is needed to choose

Companies

Name

Address

Year

Grants

Gid

Recipient

Amount

Supervisor

Manager

Contacts

Cid

Email

Phone

Organizations

Code

Year

Fundings

FId

FinId

Finances

FinId

Budget

Phone

v1

v2

v3

v4

f1

f2f3

f4

Page 46: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

The Mapping Language- Syntax

foreach x1 in g1, . . . , xn in gn

where B1

exists y1 in g'1, . . . , ym in g'mwhere B2

with e1 = e'1 and . . . and ek = e'k

foreach c in companies, g in grantswhere c.name=g.recipient

exists o in organizations,f in o.fundings,i in finances

where f.finId = i.finIdwith o.code = c.name

and f.fId = g.gIdand i.budget = g.amount

xi in gi (generator)•xi variable•gi set (either the root or a set nested within it)

B1 conjunction of equalities over the xi variables

e1 = e'1 … equalities between a source expression and a target expression

The example:

Page 47: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Primary and Relative paths

Primary path (given a schema root R, that is a first level element in the schema):x1 in g1, x2 in g2, …, xn in gn

where g1 is an expression on R (just R?), gi (for i ≥ 2) g1 is an expression on xi-1

Examplesc in companieso in organizations, f in o.fundings

Relative path with respect to a variable x x1 in g1, x2 in g2, …, xn in gn

where g1 is an expression on x, gi (for i ≥ 2) g1 is an expression on xi-

1

Examplef in o.fundings

Page 48: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

A schema: a sequence of labels(roots) each with associated type, defined by this grammar:

Schema and types

Atomic types A set typeComplex types

Repeated elementsAll and choice model-groups

Instances: associates each schema root a valueA value for atomic types

An unordered tuple of pairs

A pair

setID

Page 49: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Correspondences

Page 50: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

the data exchange problem

Page 51: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Query generation challenges

1. Creation of New Values in the Target

Optional: Null

Not nullable: one-to-one Skolem function

namesalary

spouse

dateofbirth

But if it is emp ID

Page 52: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

1. Creation of New Values in the Target

Refrential constraints

Query generation challenges

Page 53: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

2. Grouping Nested elements

Query generation challenges

Page 54: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

3. Value Creation interacts with Grouping

Query generation challenges

Page 55: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Recursion in XML schema

Page 56: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

the Chase

Given as association, repeatedly applying a chase rule to the "current" association (initialed as the input one)If there is a NRI constraint

foreach X exists Y where Bsuch that the "current" association contains X and does not contain a Y that satisfies Bthen add Y to the generators and B to the where clause

Example. If we start with from g in grants

then we have to add various components and obtainfrom g in grants, c in companies,

s in contacts, m in contactswhere g.recipient = c.name and g.supervisor = s.cid and

g.manager = m.cid

Page 57: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Clio: Analysis and Conclusion

Termination and Complexity of the Chase:the Chase with general dependecies may not

be terminateCyclic dependencies

NRIs: A weakly acyclic setthe number of Chase steps is polynomial

Conculsion

Page 58: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Clio mappingA Clio mapping: for each AS exists AT

with EAS , AT : logical associations (on source and

target, resp.)E a conjunction of equalities:

for each correspondence v in C covered by <AS , AT> , E includes the equality h(eS )=h(eT ) which is the result of the coverage, for one of the coverages

Page 59: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Structural Association

Structural association:− from P (with P primary path)

Starts from the Root of the schema

CompaniesNameAddressYear

Grants GidRecipientAmountSupervisorManager

Contacts CidEmailPhone

OrganizationsCodeYearFundings

FIdFinId

FinancesFinIdBudgetPhone

Page 60: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Nested Referential Integrity (NRI) constraints

P1 is a primary pathP2 is a primary path or a relative path

with respect to a variable in P1

B is a conjunction of equalities between an expression on a variable of

P1

and an expression on a variable of P2

o in organizations, f in o.fundings

f in o.fundings

foreach o in organizations, f in o.fundings exists i in finances

where f.finId = i.finId

Organizations:Code

Year

Fundings:

FId

FinId

FinancesFinId

Budget

Phone

The basis for discovery of associations: capture relation foreign key and referential constraints as well as XML keyref constraint:

foreach P1 exists P2 where B

f4

Page 61: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Logical association: semantic relationships between schema elementsObtained by starting with a structural association and "chasing" NRI constraints

Logical Association

Page 62: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Logical Association- the Chase

Companies

Name

Address

Year

Grants

Gid

Recipient

Amount

Supervisor

Manager

Contacts

Cid

Email

Phone

Organizations

Code

Year

Fundings

FId

FinId

Finances

FinId

Budget

Phone

v1

v2

v3

v4

f1

f2

f3f4

start with a structural association

f2

f3

Page 63: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Logical Association Relationships

A2 dominates A1 (A1 ≤ A2 ) if the from and where clauses of A1 are subsets

of those of A2 (after suitable renaming)

A2 : from g in grants, c in companies, s in contacts, m in contactswhere g.recipient = c.name and g.supervisor = s.cid and

g.manager = m.cid

A1 : from g in grants, c in companieswhere g.recipient = c.name

Page 64: Data Integration

Information Systems Group Leila Jalali, Candidacy Exam

Mapping Generation AlgorithmInputs: S , T ,

CorrespondencesLogical associations are meaningful combinations of

correspondencesGenerate all Logical Associations : AS , AT

Which correspondences can be interpreted together?For each suitable pair <AS , AT>: find the

correspondences covered by the pair with some renaming <h,h‘>, Check for dominance

Output: the set of Schema Mappings

AS : from c in companiesAT : fom o in organizations

M: for each c in companies exists o in organizations with c.name = o.code

Generate Clio Mapping: foreach AS exists AT with W

W is the equality h(eS )=h(eT ) Add the Clio Mapping to the Set of

Mappings