towards the preservation of keys in xml data transformation for integration

48
Towards the Preservation of Keys in XML Data Transformation for Integration Md. Sumon Shahriar and Jixue Liu Data and Web Engineering Lab Computer and Information Science University of South Australia

Upload: irma-cooley

Post on 02-Jan-2016

22 views

Category:

Documents


1 download

DESCRIPTION

Towards the Preservation of Keys in XML Data Transformation for Integration. Md. Sumon Shahriar and Jixue Liu Data and Web Engineering Lab Computer and Information Science University of South Australia. Outline of the Presentation. Motivation for XML Data Transformation with XML keys - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Towards the Preservation of Keys in XML Data Transformation for Integration

Towards the Preservation of Keys in XML Data Transformation for Integration

Md. Sumon Shahriar and Jixue LiuData and Web Engineering Lab

Computer and Information Science

University of South Australia

Page 2: Towards the Preservation of Keys in XML Data Transformation for Integration

Outline of the Presentation

Motivation for XML Data Transformation with XML keys

How to define XML keys How to transform XML keys Whether transformed XML keys are valid

and preserved [Key Preservation] If XML key is not preserved, how to capture

XML key as XML functional dependency (XFD) [Key Transition]

Page 3: Towards the Preservation of Keys in XML Data Transformation for Integration

Data Transformations for Integration

RelationalRelational RelationalXML XMLRelational XMLXML

Page 4: Towards the Preservation of Keys in XML Data Transformation for Integration

Data Transformations for Integration with Constraints

RelationalRelational RelationalXML XMLRelational XMLXML

Constraint (keys, functional dependencies etc.) preservations (a.k.a

propagations) are well studied

•Little investigated!•Mostly structural transformations of

schema and data ignoring constraints!•Reason: document-centric approach rather than data-centric approach of

XML

Page 5: Towards the Preservation of Keys in XML Data Transformation for Integration

Motivating Example 1

Source DTD Da :

<!ELEMENT enroll(dept+)><!ELEMENT dept(dname, (cid,sid+)+)>

Target DTD Db:

<!ELEMENT enroll(dept+)><!ELEMENT dept(dname, (cid,sid)+)>

Nested

Flat-likeUnnest(sid) Operation

Page 6: Towards the Preservation of Keys in XML Data Transformation for Integration

Vr

V1

V2

V3 V4 V5 V9 V10 V11

enroll

dept dept

dnamedname

cid sid cid sid

Physics

Chemistry

Phys01

001 Chem02

V6 V7 V8sid cid sid002 Phys0

2003 004

V12

002

sid

Vr

V1

V2

V3 V4 V5 V10 V12 V13

enroll

dept dept

dname dname

cid sid sid cid

Physics

Chemistry

Phys01

001 Chem02

V6 V7 V8cid sid cid002 Phys0

2003

V14

002

sidsidV9 V11

cid

004 Chem02

Phys01

XML tree Ta

XML tree Tb

Unnest(sid)

Page 7: Towards the Preservation of Keys in XML Data Transformation for Integration

XML key consideration

Da:

<!ELEMENT enroll (dept+)>

<!ELEMENT dept (dname, (cid,sid+)+)>

Db:

<!ELEMENT enroll (dept+)>

<!ELEMENT dept (dname, (cid,sid)+)>

Unnest(sid)

K is valid on Da

K is satisfied by Ta

•Is K is transformed?: NO•Is K is valid on Db :YES•Is K is satisfied by Tb?: NO

Unnest(sid)

K(enroll/dept,{cid})

K(enroll/dept,{cid})

Page 8: Towards the Preservation of Keys in XML Data Transformation for Integration

Vr

V1

V2

V3 V4 V5 V9 V10 V11

enroll

dept dept

dnamedname

cid sid cid sid

Physics

Chemistry

Phys01

001 Chem02

V6 V7 V8sid cid sid002 Phys0

2003 004

V12

002

sid

Vr

V1

V2

V3 V4 V5 V10 V12 V13

enroll

dept dept

dname dname

cid sid sid cid

Physics

Chemistry

Phys01

001 Chem02

V6 V7 V8cid sid cid002 Phys0

2003

V14

002

sidsidV9 V11

cid

004 Chem02

Phys01

XML tree Ta

XML tree Tb

duplicates duplicates

distinct

Page 9: Towards the Preservation of Keys in XML Data Transformation for Integration

Observation

Observation 1: An XML key may not be preserved after transformation.

Page 10: Towards the Preservation of Keys in XML Data Transformation for Integration

Motivating Example 2

Target DTD Db :

<!ELEMENT enroll(dept+)><!ELEMENT dept(dname, course+)><!ELEMENT course(cid,sid+)>

Source DTD Da :

<!ELEMENT enroll(dept+)><!ELEMENT dept(dname, (cid,sid+)+)>

expand operation replacing (cid,sid+) with course

K(enroll/dept,{cid}) Vaild and satisfied

K(enroll/dept/course,{cid})

Is K Valid?Answer: NOReason: Path is transformed

Suggestion: Needs transformation of keySatisfactions?: May be or not, need to check

Page 11: Towards the Preservation of Keys in XML Data Transformation for Integration

Expanding (cid,sid+) with new element course

Page 12: Towards the Preservation of Keys in XML Data Transformation for Integration

Observation

Observation 2: How XML keys should be transformed needs to be defined when DTD is transformed

Page 13: Towards the Preservation of Keys in XML Data Transformation for Integration

Contributing on

Defining XML keys on DTD and their satisfactions

Rules for transforming XML keys using important operations

Key preservation [key to key] Defining XML functional dependencies

(XFDs) and their satisfactions Key transition [key to XFD]

Page 14: Towards the Preservation of Keys in XML Data Transformation for Integration

Contributing on

Defining XML keys on DTD and their satisfactions Defined on schema definition DTD Use a novel technique to produce

semantically correct values for key satisfactions

Can capture some properties of relational key on the sense of value completeness and disallowing redundant values

Can capture ID properties of DTD definition Improvement of key notion in XML Schema

Page 15: Towards the Preservation of Keys in XML Data Transformation for Integration

XML Key

Given a DTD D = (EN, , ), an XML key on D is

defined as K(Q,{P1,…,Pl}), where l>= 0 , Q is a complete

path on D called the selector, and {P1, ..., Pi,…, Pl}

(often denoted by P) is a set of fields where each Pi is

defined as: , where " U " means disjunction and

pij (j [1,…,ni]) is a simple path on D, (last(pij))=Str, and has the following syntax:

pij=seq

seq=e | e/seq where ;

Q/pij is a complete path.

iinp...1ipiP

ENe

Page 16: Towards the Preservation of Keys in XML Data Transformation for Integration

Example of XML keysSource DTD Da :

<!ELEMENT enroll(dept+)><!ELEMENT dept(dname, (cid,sid+)+)><!ELEMENT dname(#PCDATA)><!ELEMENT cid(#PCDATA)><!ELEMENT sid(#PCDATA)>

K(enroll/dept,{cid})

selector=enroll/deptfield={cid}(cid)=#PCDATA means Str(last(cid))=Str

K(enroll/dept,{cid,sid})

selector=enroll/deptfields={cid,sid}(last(cid))= (last(sid))= Str

Page 17: Towards the Preservation of Keys in XML Data Transformation for Integration

Some definitions for XML key satisfactions

[P-tuple] Given a key K(Q,{P1,...,Pl}) and a tree T,

let TQ be a tree in T. A P-tuple in TQ is a tuple of

pair-wise close sub-trees . By pair-wise close, we mean tuples in the same

minimal hedge A P-tuple is complete if We call TP =Tlast(P) the prefixed format tree. For example

P=enroll/dname. Then TP =Tdname

)T...T( l1 PP

)T)(T...T(T il1i PPPP

Page 18: Towards the Preservation of Keys in XML Data Transformation for Integration

Proposed techniques

[Hedge] Hedge is a consecutive sequence of primary sub-trees of the same node.

[Minimal structure] Given a DTD definition (e) and two elements e1 and e2 in (e), the minimal structure g of e1 and e2 in (e) is the pair of brackets that encloses e1 and e2 and any other structure in g does not enclose both.

[Minimal Hedge] Given a hedge H of (e), a minimal hedge of e1 and e2 is one of Hgs in H.

Page 19: Towards the Preservation of Keys in XML Data Transformation for Integration

Example of minimal structure, minimal hedge and P-tuple

Vr

V1

V2

V3 V4 V5 V9 V10 V11

enroll

dept dept

dnamedname

cid sid cid sid

Physics

Chemistry

Phys01

001 Chem02

V6 V7 V8sid cid sid002 Phys0

2003 004

V12

Ta

Da:

<!ELEMENT enroll (dept+)>

<!ELEMENT dept (dname, (cid,sid+)+)>

K(enroll/dept,{cid,sid})

•P1=cid, P2=sid•Minimal structure is g=(cid,sid+)•Minimal hedges are: H1=v4v5v6, H2=v7v8 under node v1 and H3=v10v11v12 under node v2

•P-tuples are: F1=v4v5, F2=v4v6 for hedge H1, F3=v7v8 for hedge H2 for node v1 and F4=v10v11, F5=V10v12 for hedge H3 for node v2

002

sid

H1g H2

g H3g

Page 20: Towards the Preservation of Keys in XML Data Transformation for Integration

Produced P-tuples

))001:v)(001Phys:v(()TT(F 54vv1 54

))002:v)(001Phys:v(()TT(F 64vv2 64

))003:v)(002Phys:v(()TT(F 87vv3 87

))004:v)(002Chem:v(()TT(F 1110vv4 1110

))002:v)(002Chem:v(()TT(F 1210vv5 1210

Page 21: Towards the Preservation of Keys in XML Data Transformation for Integration

XML Key SatisfactionAn XML tree satisfies a Key K(Q,{P1,…Pl}) if the followings are held:

If {P1,…Pl}= then T satisfies K iff there exists one and only one TQ in T;

Else (exists at least one P-tuple in TQ) (every P-tuple in TQ is complete) (every P-tuple in TQ is value distinct) (exists two P-tuples

)

QQ TT QQ TT

QQ TT

QQ2

Q1 TT,T

Q2

Q1

P2

P2v

P1

P1

Q2

P2

P2

Q1

P1

P1 TT)T...T()T...T(T)T...T(T)T...T( l1l1l1l1

This requires that P-tuples in different TQ must be value distinct.

Page 22: Towards the Preservation of Keys in XML Data Transformation for Integration

Checking satisfaction of key

))001:v)(001Phys:v(()TT(F 54vv1 54

))002:v)(001Phys:v(()TT(F 64vv2 64

))003:v)(002Phys:v(()TT(F 87vv3 87

))004:v)(002Chem:v(()TT(F 1110vv4 1110

))002:v)(002Chem:v(()TT(F 1210vv5 1210

TQ=Tv1

TQ=Tv2

Page 23: Towards the Preservation of Keys in XML Data Transformation for Integration

Contributing on

Rules for transformation on key definitionA key is transformed if any path in the

key is transformed. After the transformation, key needs to be

checked whether it is valid on target schema.

If a key is not transformed, it is valid on target DTD

Page 24: Towards the Preservation of Keys in XML Data Transformation for Integration

Transformation on key

• Unnest operation:• g=(g1xg2+)+g=(g1xg2)+

• Example: (cid,sid+)+ (cid,sid)+• It makes the nested structure to flat-

like structure• No path transformation• No change in the key definition

Page 25: Towards the Preservation of Keys in XML Data Transformation for Integration

Transformation on key

• Nest operation:• g=(g1xg2)+ g=(g1xg2+)+

• Example: (cid,sid)+ (cid,sid+)+• It makes the flat-like structure to

nested structure• No path transformation• No change in the key definition

Page 26: Towards the Preservation of Keys in XML Data Transformation for Integration

Transformation on key

• Expand operation:• g=(g1xg2 +)+

g=(gnew)+, gnew =g1xg2+

• Example: g=(cid,sid+)+

g=(course+), gnew=(cid,sid+)+

• It pushes the structure to one level down• Path is transformed in DTD and so in key• Needs some rules to transform key correctly

Page 27: Towards the Preservation of Keys in XML Data Transformation for Integration

Transformation on key

• Transformation rules on key using expand:• Depends where the new element is added in the key

paths (either selector or field)

K(enroll/dept,{cid,sid})

K(enroll/dept/course,{cid,sid}) K(enroll/dept,{course/cid,course/sid})

expand((cid,sid+), course)

K(enroll/dept,{cid,sid})

expand(sid+, stIDs)

K(enroll/dept,{cid,stIDS/sid})

Da:

<!ELEMENT enroll (dept+)>

<!ELEMENT dept (dname, (cid,sid+)+)>

Page 28: Towards the Preservation of Keys in XML Data Transformation for Integration

Transformation on key

• Collapse operation:• g=(gcoll)+, gcoll =g1xg2+

g=(g1xg2 +)+

• Example:

g=(dept+), gdept=(cid,sid+)

g=(cid,sid+)+ • It moves the structure to one level up• Path is transformed in DTD and so in key• Needs some rules to transform key correctly

Page 29: Towards the Preservation of Keys in XML Data Transformation for Integration

Transformation on key

• Transformation rules on key using collapse:• Depends which element is deleted in the key paths

(either selector or field)

K(enroll/dept,{cid,sid})

K(enroll,{cid,sid})

collapse(dept)

K(enroll,{dept/cid,dept/sid})

K(enroll,{cid,sid})

Da:

<!ELEMENT enroll (dept+)>

<!ELEMENT dept (dname, (cid,sid+)+)>

collapse(dept)

Page 30: Towards the Preservation of Keys in XML Data Transformation for Integration

Contributing on

[Key preservation]

Given a source DTD, its conforming document, a valid key that is satisfied by the document, if the transformed key is valid on target DTD and is satisfied by the target document then key is said to be preserved by the transformation.

Page 31: Towards the Preservation of Keys in XML Data Transformation for Integration

Key preserving properties of operations

Preserving:Nest and collapse

Preserving with necessary and sufficient conditions:Unnest and Expand

Page 32: Towards the Preservation of Keys in XML Data Transformation for Integration

Theorem:

Unnest operator is key preserving if some key fields don’t cross g1.

Page 33: Towards the Preservation of Keys in XML Data Transformation for Integration

Example to explainUnnest(sid)

However if the key isK(enroll,{cid,sid}), then Key is preserved

(cid,sid+)+

g1g2

K(enroll/dept,{cid})

Page 34: Towards the Preservation of Keys in XML Data Transformation for Integration

Theorem:

Expand operator is key preserving if when the selector is transformed, then every tree for selector has a P-tuple.

Page 35: Towards the Preservation of Keys in XML Data Transformation for Integration

Example to explain

No duplicate cid’s are produced

distinct

K(enroll/dept,{cid})

K(enroll/dept/course,{cid})

K(enroll/dept,{course/cid})

Page 36: Towards the Preservation of Keys in XML Data Transformation for Integration

Contributing on

[Key transition] Given a source DTD, its conforming

document, a valid key that is satisfied by the document, if the transformed key is valid on target DTD and is not satisfied by the target document but if key is transformed to XFD and is satisfied by the target document then we say XML key is transited as XFD.

Page 37: Towards the Preservation of Keys in XML Data Transformation for Integration

XML functional dependency (XFD)

Given a DTD D = (EN, , ), an XML key on D isdefined as (S, PQ), where S is a completepath on D called the scope, P is a set of simple paths

P={p1, ...,pi,…,pl} called determinant or LHS, Q is asimple path or empty path called dependent or RHS, and S/P and S/Q are complete paths.

If Q=, then XFD (S, P ) implies that Plast(S)meaning that P determines S

Page 38: Towards the Preservation of Keys in XML Data Transformation for Integration

Tuple for XFD

[Tuple] Given an XFD (S,PQ) and a tree

T,let TS be a tree in T. A tuple in TS is a tuple

of pair-wise close sub-trees . By pair-wise close, we mean tuples in the

same minimal hedge By P-tuple, we mean the tuple for paths P By Q-tuple, we mean the tuple for path Q A P-tuple is complete if A P-tuple is complete if

)T)(T...T(T il1i PPPP

)T( Q

Page 39: Towards the Preservation of Keys in XML Data Transformation for Integration

XFD satisfactions

An XML tree satisfies an XFD (S, PQ) if the followings are held:

If Q= then is complete; Else

are complete. For every pair of tuples F1[P] and F2[Q] in TS,

if F1[P]=vF1[Q], then F1[Q]=vF2[Q].

]P[F,T]P[F S

]Q[F],P[andFT])Q[F],P[F( S

Page 40: Towards the Preservation of Keys in XML Data Transformation for Integration

Key transition algorithm

1: check=CheckKeyTransformation(k, UnNest);

2: if check=TRUE then3: TransformKeyToXFD(k);4: end if5: if target T satisfies the XFD Φ then6: return Φ and ”KeyTransited”;7: end if

Page 41: Towards the Preservation of Keys in XML Data Transformation for Integration

Function CheckKeyTransformation(k, UnNest)

1: if g1 crossing any Pi in [P1, · · · , Pn] at an element e

where e in g1 and e in Pi then2: return TRUE;3: else4: return FALSE;5: end if

Page 42: Towards the Preservation of Keys in XML Data Transformation for Integration

Function TransformKeyToXFD(k)

1: Φ[S] := k[Q];

2: for all i such that 1 ≤ i ≤ n do

3: Φ[Pi] := k[Pi];

4: end for

5: Φ[Q] := ;6: return Φ(S, {P} → Q);

Page 43: Towards the Preservation of Keys in XML Data Transformation for Integration

Vr

V1

V2

V3 V4 V5 V9 V10 V11

enroll

dept dept

dnamedname

cid sid cid sid

Physics

Chemistry

Phys01

001 Chem02

V6 V7 V8sid cid sid002 Phys0

2003 004

V12

002

sid

Vr

V1

V2

V3 V4 V5 V10 V12 V13

enroll

dept dept

dname dname

cid sid sid cid

Physics

Chemistry

Phys01

001 Chem02

V6 V7 V8cid sid cid002 Phys0

2003

V14

002

sidsidV9 V11

cid

004 Chem02

Phys01

XML tree Ta

XML tree Tb

duplicates duplicates

distinctK(enroll/dept,{cid})

Φ(enroll/dept,{cid} )

Page 44: Towards the Preservation of Keys in XML Data Transformation for Integration

Theorem: An XML key on source DTD

can only be transited to an XFD on the

target DTD if the key is satisfied by the

conforming source document.

Page 45: Towards the Preservation of Keys in XML Data Transformation for Integration

Talked on

XML data transformation with keys A new definition for XML keys Transformation rules for keys Key preservations Key transition Also a new definition for XML

functional dependency (XFD)

Page 46: Towards the Preservation of Keys in XML Data Transformation for Integration

our papers

“On Defining Keys for XML”, IEEE cit2008, Database and Data Mining Workshop, Sydney

“Key Preserving P2P Data Transformation for XML”,LNCS, DBISP2P,2008(VLDB Workshop), Auckland, New Zealand

“Transition of keys in XML Data Transformation”, IEEE CSA2008, Hobart.

“On Defining Functional Dependency for XML”, IEEE IWSCA 2008, Korea

Page 47: Towards the Preservation of Keys in XML Data Transformation for Integration

Other research issues

Already done “Preserving functional dependency in XML data

transformation”, LNCS, ADBIS 2008, Finland. Preserving Inclusion dependency in XML data

transformation Future work

Adaptation of constraints in XML data integration• Detecting conflicts between source constraints and target

constraints in XML settings Checking Validations and satisfactions of the constraints

• XML keys, XFDs and XML inclusion dependencies (XID) Performances in XML data transformation and Integrations

with constraints

Page 48: Towards the Preservation of Keys in XML Data Transformation for Integration

Thank You

Questions