implementing mapping composition

24
Implementing Mapping Composition Todd J. Green* University of Pennsylania with Philip A. Bernstein (Microsoft Research), Sergey Melnik (Microsoft Research), Alan Nash (UC San Diego) VLDB 2006 Seoul, Korea *Work partially supported by NSF grants IIS0513778 and IIS0415810

Upload: gizi

Post on 05-Jan-2016

10 views

Category:

Documents


1 download

DESCRIPTION

Implementing Mapping Composition. Todd J. Green * University of Pennsylania with Philip A. Bernstein (Microsoft Research), Sergey Melnik (Microsoft Research), Alan Nash (UC San Diego) VLDB 2006Seoul, Korea *Work partially supported by NSF grants IIS0513778 and IIS0415810. Schema mappings. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Implementing Mapping Composition

Implementing Mapping Composition

Todd J. Green* University of Pennsylania

with Philip A. Bernstein (Microsoft Research),

Sergey Melnik (Microsoft Research),

Alan Nash (UC San Diego)

VLDB 2006 Seoul, Korea*Work partially supported by NSF grants IIS0513778 and IIS0415810

Page 2: Implementing Mapping Composition

2

Mapping: a correspondence between instances of different schemas

Schema mappings

StudentsName,Address

NamesSID,Name

AddressesSID,Address

m

S1 S2

Students Name,Address (Names Addresses)⋈

Page 3: Implementing Mapping Composition

3

Schema evolution

Applications of mappings

StudentsName,Address,Country

NamesSID,Name

AddressesSID,Address,Country

...m12 m23

S3S2

NamesSID,Name

LocalSID,Address

ForeignSID,Address,Country

Names Names

σCountry = KR(Addresses) SID,Address(Local)£{KR}σCountry KR(Addresses) Foreign

S1

Students Name,Address,Country(Names Addresses)⋈

Page 4: Implementing Mapping Composition

4

Data integration, data exchange

Applications of mappings

StudentsName,Address,Country

NamesSID,Name

AddressesSID,Address,Country

...

...m1 mn

S1

NamesSID,Name

ForeignSID,Address, Country

LocalSID,Address

Students Name,Address

(Names ⋈ Addresses)

Names NamesLocal SID,Address(Country = KR(Addresses))

Foreign Country KR(Addresses)

Sn−1

Sn

Page 5: Implementing Mapping Composition

5

Requirements for constraints

“First attribute in R is a key for R”

2,4(R ⋈1=3 R) µ 2,2(R)

“View V equals R joined with S”

V µ R ⋈ S, V ¶ R ⋈ S

“Second attribute of R is a foreign key in S”

2(R) µ 1(S)

2,4(S ⋈1=3 S) µ 2,2(S)

Data integration, data exchange – GLAV

R ⋈ S µ T ⋈ U

Page 6: Implementing Mapping Composition

6

NamesSID,Name

AddressesSID,Address,Country

S2

StudentsName,Address,Country

NamesSID,Name

LocalSID,Address

ForeignSID,Address,Country

m12 m23

Students Name,Address, Country (Names ⋈

(SID,Address(Local)£{KR} [ Foreign))

Mapping composition

S1 S3

m12

Students Name,Address,Country(Names ⋈ Addresses)

Names Names

σCountry = KR(Addresses) SID,Address(Local)£{KR}σCountry KR(Addresses) Foreign

m23

Page 7: Implementing Mapping Composition

7

Composition is hard Hard part: write composition in the same language

as the input mappings. Depending on language: Not always possible Not even decidable whether possible

Strategy 1: use powerful (second-order) mapping language closed under composition [FKPT04] Not supported by DBMS today Expensive to check Source-target restriction

Strategy 2: settle for partial solutions [NBM05] Containment mappings easier integration with DBMS The strategy we adopt in this work

Page 8: Implementing Mapping Composition

8

Our contributions

New algorithm for composition problem Incorporates view unfolding and left-

composition (new technique)Makes best effort in failure casesAlgebraic rather than logic-based mappingsUse of monotonicity to handle more operatorsModular and extensible factoring of algorithm

First implementation of compositionExperimental evaluation

Page 9: Implementing Mapping Composition

9

) R ⊆ (U)⋈(V - W)

Formal definition of composition

Mapping: set of pairs of instances of db schemas

The composition m12 ± m23 is the mapping

{hA,Ci : (9B)(hA,Bi 2 m12 and hB,Ci 2 m23)}

where A,B,C are instances of S1,S2,S3

Composition problem: find constraints in same language as input mappings giving the composition of the input mappings

Example:

S1 = {R}, S2 = {S,T}, S3 = {U,V,W}

R ⊆ S⋈T, S ⊆ (U), T = V – WR(∙,∙,∙)

S(∙,∙)

T(∙,∙)

U(∙,∙,∙)

V(∙,∙)

S1 S2 S3

m12 m23

R ⊆ S⋈T

S ⊆ (U),T = V – W

W(∙,∙)

Page 10: Implementing Mapping Composition

10

Best-effort composition problem

Composition not always possible“Best-effort” composition problem: compute

set of constraints equivalent to input constraints, but with as many symbols from S2 eliminated as possible

R ⊆ U, R ⊆ V,

1,4(2=3(UU)) ⊆ U, 1,4(2=3(VV)) ⊆ V,U ⊆ T, V ⊆ TCan eliminate U (cross out left column) or V

(right column), but not both [NBM05]

Page 11: Implementing Mapping Composition

11

Composition algorithm overview

For each relation R in S2

Try to eliminate R via (1) view unfolding

Replace = by pairs of ⊆, ⊇For each relation R in S2 not yet eliminated

Try to eliminate R via (2) left composeElse, try to eliminate R via (3) right compose

Output:

New constraints and list of relations successfully eliminated

Page 12: Implementing Mapping Composition

12

(1) View unfolding

Idea: exploit equality constraints (if we have any) Standard technique: substitute view definition

for occurrences of view relation in mappings

T = V – W, R ⊆ S ⋈T, T X ⊆ (U)

R ⊆ S ⋈(V – W), (V – W) X ⊆ (U)

Body must not mention view relation itself Doesn’t matter what else is in body Can substitute everywhere

Page 13: Implementing Mapping Composition

13

(2) Left compose

“View unfolding” for containment constraints(V) ⊆ R – U, R ⊆ S ⋈ T

(V) ⊆ (S ⋈ T) – U Needs monotonicity of expressions in R.

E1 ⊆ E2(R), R ⊆ E3 ´ E1 ⊆ E2(E3)

if E2(R) is monotone in R (and R not in E3)

Partial check for monotonicity

“Is S – (T – R) monotone in R?”

Page 14: Implementing Mapping Composition

14

Normalization for left compose

Need one constraint of form R ⊆ E1

Use identities to normalize, e.g.: R ⊆ E1 and R ⊆ E2 iff R ⊆ E1 E2

E1 E2 ⊆ E3 iff E1 ⊆ E3 and E2 ⊆ E3

(E1) ⊆ E2 iff E1 ⊆ E2 Dr

More identities in paperAfter left compose, try to eliminate D

Page 15: Implementing Mapping Composition

15

(3) Right compose

Dual to left compose, from [NBM05] Example:

S ⋈T R, R – U (V)

(S ⋈T) – U (V) Monotonicity check needed here too Normalization may introduce Skolem functions

E1 (E2) iff f(E1) E2

Must eliminate Skolem functions after composition Lots of effort coding this step!

Page 16: Implementing Mapping Composition

16

User-defined operators

User specifies: Monotonicity of operator in its arguments

“If E1 monotone in R and E2 antimonotone in R or independent of R, then E1 * E2 monotone in R”

“if E1 monotone in R or independent of R and E2 antimonotone in R, then E1 * E2 monotone in R”

Identities for normalization

“E1 * E2 E3 iff E1 E2 E3 ”

User-defined operators and standard relational operators treated uniformly

Page 17: Implementing Mapping Composition

17

Implementation 12K lines of C# code, command-line tool

# Test case 13: PODS05 example 2SCHEMA R(2), S(2), T(2)CONSTRAINTS R <= S, P_{0,2} J_{0,1:1,2} (S S) <= R, S <= TELIMINATE S;

Output:

P_{0,2} J_{0,1:1,2}(R R) <= R,R <= T

Page 18: Implementing Mapping Composition

18

Experimental evaluation

First attempt at a composition benchmark Schema editing and schema reconciliation

scenarios “Add a column to R to produce S”: (R) = S

Measure % of symbols eliminated Running time

As a function of Editing primitives allowed, length of edit sequence,

presence/absence of keys, starting schema size, …

Synthetic data

Page 19: Implementing Mapping Composition

19

Summary of results

Algorithm often effective in eliminating most or even all relation symbols from S2

Running time in subsecond range even for large problems containing hundreds of constraints

Certain schema editing primitives problematic Key constraints did not reduce effectiveness,

although did increase running time (and output size)

Page 20: Implementing Mapping Composition

20

Schema editing

0

0.5

1

1.5

2

2.5

3

3.5

0 10 20 30 40 50 60 70 80 90 100Run number

Exe

cu

tio

n t

ime

(sec

)

Random starting schema (30 relations of 2-10 attributes) 100 random edits 100 different runs, sorted by execution time

Page 21: Implementing Mapping Composition

21

Schema reconciliation (1)

0

0.2

0.4

0.6

0.8

1

10 30 50 70 90 110 130 150 170 190 210 Number of edits

fraction ofsymbolseliminated

executiontime (sec)

Random schema (30 relations of 2-10 attributes), random edits Point represents median time of reconciliation step of 500 runs

Page 22: Implementing Mapping Composition

22

Schema reconciliation (2)

0

0.2

0.4

0.6

0.8

1

10 20 30 40 50 60 70 80 90 100 Schema size

Fra

ctio

n o

f sy

mb

ols

el

imin

ated

complete

no viewunfolding

no rightcompose

Random schema (variable # relations of 2-10 attributes) 100 random edits 100 different runs, sorted by execution time

Page 23: Implementing Mapping Composition

23

Related work

[MH03] J. Madhavan, A. Y. Halevy. Composing mappings among data sources. VLDB, 2003.

[FKPT04] R. Fagin, Ph. G. Kolaitis, L. Popa, W.C. Tan. Composing schema mappings: second-order dependencies to the rescue. PODS, 2004.

[NBM05] A. Nash, P. A. Bernstein, S. Melnik. Composition of mappings given by embedded dependencies. PODS, 2005.

Page 24: Implementing Mapping Composition

24

Conclusion and future work

We motivated and described the mapping composition problem

We presented an implementation of a practical new algorithm for the composition problem

We also presented an experimental evaluationTo do: theoretical analysis of impact of user-

defined operatorsTo do: output constraints from algorithm can

be a mess! How to clean up?