1 ivox i ncremental v iew maintenance for o rdered x ml dsrg talk wpi february 20 th 2003 students:...

52
1 IVOX IVOX I I ncremental ncremental V V iew Maintenance for iew Maintenance for O O rdered rdered X X ML ML DSRG Talk WPI February 20 th 2003 Students: Katica Dimitrova & Maged El Sayed Advisor: Prof. Elke Rundensteiner

Post on 22-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

1

IVOXIVOX IIncremental ncremental VView Maintenance for iew Maintenance for

OOrdered rdered XXMLML

DSRG Talk WPI February 20th 2003

Students: Katica Dimitrova & Maged El SayedAdvisor: Prof. Elke Rundensteiner

2

OutlineOutline

Motivation Problem Description Background

XML Algebra Order in XML Algebra

The IVOX Approach Order Encoding Overall strategy

System Architecture Related Work Future Work

3

OutlineOutline

Motivation Problem Description Background

XML Algebra Order in XML Algebra

The IVOX Approach Order Encoding Overall strategy

System Architecture Related Work Future Work

4

MotivationMotivation

Views in general Data warehouses Information integration Access control, Privacy, ..etc

XML Views (EXTRA useful) Information Inter-Portability Crossing gaps between

different data models

Materialized Views Speed up data retrieval Query optimization Increased availability

RDB XMLOther

Sources

View

View Definition

Query

5

Maintaining Materialized Views Maintaining Materialized Views

When sources are updated, materialized view may becomes inconsistent.

Methods of view maintenance Recomputation

recompute view from scratch from base data

Incremental view maintenance compute changes to view in response to changes to base sources

Heuristic: Incremental view maintenance is usually cheaper than full recomputation.

6

OutlineOutline

Motivation Problem Description Background

The XAT Algebra XML order in the XAT Context

The IVOX Approach Order Encoding Overall strategy

System Architecture Related Work Future Work

7

The ProblemThe Problem

Previous work for: Relational [GMS93], bag semantics [GL95], [ZGHW95], [PSCP02]

Object-Relational [LVM00]

Object-Oriented [AFP02]

Structured data models [AMRVW98], [ZM98]

XML data model not handling order [LD00]

Can techniques for other data models be reused for XML?

8

Is Maintaining XML Views Different?Is Maintaining XML Views Different?

XML features Hierarchical Optional elements Self-typed References Ordered

Expressiveness of view definition language Complex operations

tagging, unnesting, aggregation, .. Expected large auxiliary information

9

ExampleExample<bib>

<book> <price> 65.95 </price>

<title> Advanced Programming in the Unix environment </title>

</book> <book> <title> TCP/IP Illustrated </title>

</book> <book> <price>39.95</price>

<title> Data on the Web </title> </book></bib>

<bib><book>

<price> 65.95 </price><title> Advanced Programming in the Unix environment </title>

</book> <book> <title> TCP/IP Illustrated </title>

</book> <book> <price>39.95</price>

<title> Data on the Web </title> </book></bib>

<result> for $b in document("bib.xml")/bib/book where $b/price/text() < 60 return <book>

$b/title, $b/price </book></result>

<result> for $b in document("bib.xml")/bib/book where $b/price/text() < 60 return <book>

$b/title, $b/price </book></result>

List all books that cost less than $60, including their title and price

<result>

<book>

<title>Data on the Web</title>

<price>39.95</price>

</book>

</result>

<result>

<book>

<title>Data on the Web</title>

<price>39.95</price>

</book>

</result>

Bib.xml

View Definition Query

View Extent

10

ExampleExample

Insert element<price>55.48</price> into second book

Bib.xml

<result>

<book>

<title>Data on the Web</title>

<price>39.95</price>

</book>

</result>

<result>

<book>

<title>Data on the Web</title>

<price>39.95</price>

</book>

</result>View Extent

<book> <title>TCP/IP Illustrated</title> <price>55.48</price> </book>

<bib><book>

<price> 65.95 </price><title> Advanced Programming in the Unix environment </title>

</book> <book>

<title> TCP/IP Illustrated </title></book>

<book> <price>39.95</price>

<title> Data on the Web </title> </book></bib>

<bib><book>

<price> 65.95 </price><title> Advanced Programming in the Unix environment </title>

</book> <book>

<title> TCP/IP Illustrated </title></book>

<book> <price>39.95</price>

<title> Data on the Web </title> </book></bib>

<price>55.48</price>

<result> for $b in document("bib.xml")/bib/book where $b/price/text() < 60 return <book>

$b/title, $b/price </book></result>

<result> for $b in document("bib.xml")/bib/book where $b/price/text() < 60 return <book>

$b/title, $b/price </book></result>

View Definition Query

11

Our Goal Our Goal

Design incremental view maintenance strategy for XQuery views that:

Correctly update the view

Is order sensitive Returns view in proper order Allows for updates that specify order

Covers at least the “core” of XQuery language views

Minimizes auxiliary information requirements

12

Basics of IVOX Approach: AlgebraicBasics of IVOX Approach: Algebraic

Update propagation rules for each algebra operator and each update type

XML Source

XML Source

XML Source

XML View

Update

Update

Algebra

Tree

XQuery Definition

Operator

D1

D2

Operator

D1 Update

D2 Update

Execution View Maintenance

time

13

Why Algebraic?Why Algebraic?

Robust – Easily adaptable to operator semantic changes

Extensible – new operators can be added

Allows for reuse of techniques for known operators

Language independent- independent of syntax changes (of XQuery by W3C)

Formal – basis for provable correctness

14

OutlineOutline

Motivation Problem Description Background

XML Algebra Order in XML Algebra

The IVOX Approach Order Encoding Overall strategy

System Architecture Related Work Future Work

15

Background on XML Algebra XATBackground on XML Algebra XAT

XAT Operators SQL Operators: Select, Project … Special Operators: Source, FOR… XML Operators: Navigate, Tagger ..

XAT Data Model (XAT Table) Order sensitive table of tuples Columns denote user-specified or

internally generated variable bindings A cell in a tuple holds an XML node

for a sequence of XML nodes

$col1, price $col3

$b $col3

<book>….

</book>

<price> 65.95

</price>

<book>….</book>

<price> 39.95

</price>

$b

<book>

<price> 65.95 </price>

<title> Advanced …</title>

</book>

<book>

<title> TCP/IP …</title>

</book>

<book>….</book>

16

Order in XAT ContextOrder in XAT Context

Order among tuples

Order among XML nodes in a cell

$col1, price $col3

$b $col3

<book>….

</book>

<price> 65.95

</price>

<book>….</book>

<price> 39.95

</price>

$b

<book>

<price> 65.95 </price>

<title> Advanced …</title>

</book>

<book>

<title> TCP/IP …</title>

</book>

<book>….</book>

17

Order in the XAT ContextOrder in the XAT Context

Agg $col5

$col5

<book> <book>

<title>TCP/IP… <title>Data …

</title>… </title> ..

</book> </book>

$col5

<book>

<title>TCP/IP …</title>

<price>55.48</price>

</book>

<book>

<title>Data … </title>

<price>39.95</price>

</book>

)

(

, Order among the tuples

Order among XML nodes in a single cell

18

Order in XAT Context: View MaintenanceOrder in XAT Context: View Maintenance

On update worry about:

Order among tuples

Order among XML nodes in a cell $col1, price $col3

$b $col3

<book>….

</book>

<price> 65.95

</price>

<book>….</book>

<price> 55.48

</price>

<book>….</book>

<price> 39.95

</price>

$b

<book>

<price> 65.95 </price>

<title> Advanced …</title>

</book>

<book>

<title> TCP/IP …</title>

<price>55.48</price>

</book>

<book>….</book>

19

Order in XAT Context & View MaintenanceOrder in XAT Context & View Maintenance

Agg $col5

$col5

<book> <book>

<title>TCP/IP… <title>Data …

</title>… </title> ..

</book> </book>

$col5

<book>

<title>TCP/IP …</title>

<price>55.48</price>

</book>

<book>

<title>Data … </title>

<price>39.95</price>

</book>

),

(On update worry about:

Order among the tuples

Order among XML nodes in a single cell

20

Duplicate Information in XAT ContextDuplicate Information in XAT Context

Complex operations require auxiliary information

Auxiliary information can be too large in XAT context

May be expensive to maintain it

$col1, price $col3

$b $col3

<book>

<price> 65.95 </price>

<title> Advanced …</title>

</book>

<price> 65.95

</price>

<book>….</book>

<price> 39.95

</price>

$b

<book>

<price> 65.95 </price>

<title> Advanced …</title>

</book>

<book>

<title> TCP/IP …</title>

</book>

<book>….</book>

Duplicated Storage!

21

OutlineOutline

Motivation Problem Description Background

XML Algebra Order in XML Algebra

The IVOX Approach Order Encoding Overall strategy

System Architecture Related Work Future Work

22

Possible Solutions to Order Preservation (I)Possible Solutions to Order Preservation (I)

Sequential storage(XPROP approach by Maged, Ling & Luping) Assume intermediate results stored

sequentially Inserts and deletes are performed in

physical order No order encoding

Special support required for secondary storage

May require iteration over many tuples to determine order

$col1, price $col3

<price> 39.95

</price>

<book>….</book>

<price> 65.95

</price>

$col3

<book>….

</book>

$b

<book>….</book>

<book>

<title> TCP/IP …</title>

</book>

<book>

<price> 65.95 </price>

<title> Advanced …</title>

</book>

$b

$b $col3

<book>….

</book>

<price> 65.95

</price>

<book>….</book>

<price> 39.95

</price>

<price>55.48</price>

<price> 55.48

</price>

<book>….</book>

23

Possible Solutions to Order Preservation (II)Possible Solutions to Order Preservation (II)

Naïve order encoding for tuples and sequences of XML nodes Assign order numbers to tuples and to

XML nodes in a sequence

Requires frequent renumbering on inserts.

$col1, price $col3

$b $col3

<book>….

</book>

<price> 65.95

</price>

<book>….</book>

<price> 39.95

</price>

$b

<book>

<price> 65.95 </price>

<title> Advanced …</title>

</book>

<book>

<title> TCP/IP …</title>

</book>

<book>….</book>

Ord

1

2

Ord

1

2

3

<price> 55.48

</price>

<book>….</book>

2

3 2

1

Ord

<price>55.48</price>

24

Using Node IdentityUsing Node Identity

Idea: Use node identity node identity

Usage: For encoding order and structure As a reference to base data

25

What Encoding For Node Identity?What Encoding For Node Identity?

bib

book book

book

price

title

title

price

title

1

2

5

7

4

3

6

8

9

Existing techniques for encoding order for XML

Global Order (UW)Global Order (UW)

Local Order (UW)

Dewey Order (UW)

Lexicographical Order (MASS)

price

6

7

8

9

10

26

bib

book book

book

price

title

title

price

title

1

1

2

3

2

1

1

1

2

Existing techniques for encoding order for XML

Global Order (UW)

Local Order (UW)Local Order (UW)

Dewey Order (UW)

Lexicographical Order (MASS)

What Encoding For Node Identity?What Encoding For Node Identity?

price

1

2

27

bib

book book

book

price

title

title

price

title

1

1.1

1.2

1.3

1.1.2

1.1.1

1.2.1

1.3.1

1.3.2

Existing techniques for encoding order for XML

Global Order (UW)

Local Order (UW)

Dewey Order (UW)Dewey Order (UW)

Lexicographical Order (MASS)

What Encoding For Node Identity?What Encoding For Node Identity?

price

1.2.1

1.2.2

28

bib

book book

book

price

title

title

price

title

b

b.b

b.d

b.f

b.b.cd

b.b.b

b.d.f

b.f.cm

b.f.l

Existing techniques for encoding order for XML

Global Order (UW)

Local Order (UW)

Dewey Order (UW)

Lexicographical Order Lexicographical Order (MASS)(MASS)

What Encoding For Node Identity?What Encoding For Node Identity?

The WinnerThe Winner

price

b.d.b

29

Lexicographical Keys: LexKeysLexicographical Keys: LexKeys

What are LexKeys? Multi-level lexicographical keys Example: c , ba.c.b

Examples of comparison b < b.c bab < bd.cc b.b < b.b.c

Advantages All LexKeys form a totally ordered set with respect to < It is always possible to generate a key between two keys The deletion of a LexKey in a sequence does not affect other LexKeys

Usage Reference to XML nodes Encoding order

30

LexKeys in XAT TablesLexKeys in XAT Tables

$b, price $col2

$b $col2

b.b b.b.b

b.f b.f.cm

$b

b.b

b.d

b.f

$b, price $col2

$b $col2

<book>….

</book>

<price> 65.95

</price>

<book>….</book>

<price> 39.95

</price>

$b

<book>

<price> 65.95 </price>

<title> Advanced …</title>

</book>

<book>

<title> TCP/IP …</title>

</book>

<book>….</book>

31

Order Among XAT TuplesOrder Among XAT Tuples

Notion: designate order schema to XAT tables Ordering by LexKeys by columns in order schema

yields correct tuple order.

$b $c $d

b.f b.b.b c.m

b.b b.f.cm d.c

b.b b.f.cm d.c.b

Order SchemaOrder Schema

11 22

1

3

2

32

Calculating Order SchemaCalculating Order Schema

Operator Order Schema

odc(out)

Tagger Tpattern $col’ (s) odc(s)

Source Sdesc $col’ none.

Navigate Unnest $col, path $col’ (s)

If col is last in odc(s)

Concat (odc(s) – col, col’ )

else

Concat (odc(s), col’ )

Rules for each operator Calculated in a postorder traversal of the tree Sample Rules

33

Order Among Tuples ExampleOrder Among Tuples Example

$b, price $col2

$b $col2

b.b b.b.b

b.f b.f.cm

$b

b.b

b.d

b.f

$b, price $col2

$b $col2

<book>….

</book>

<price> 65.95

</price>

<book>….</book>

<price> 39.95

</price>

$b

<book>

<price> 65.95 </price>

<title> Advanced …</title>

</book>

<book>

<title> TCP/IP …</title>

</book>

<book>….</book>

11

11

2

1

3

2

1

34

Order in Collection within a cell?Order in Collection within a cell?

Agg $col5

$col5

<book> <book>

<title>TCP/IP… <title>Data …

</title>… </title> ..

</book> </book>

$col5

<book>

<title>TCP/IP …</title>

<price>55.48</price>

</book>

<book>

<title>Data … </title>

<price>39.95</price>

</book>

)

(

,

Agg $col5

$col5

tbb tbc

$col2 $col4 $col5

b.f.cm b.f.l tbb

b.d.f b.d.b tbc

{ },

11 22

1

2

12

35

Smart KeysSmart Keys

What is a SmartKey?

Key (LexKey)

Overriding Order

(LexKey)

SmartKeySmartKey

Key part, by default also represents order

Optional, only represents order when present

Notation: key(order) Examples

b.c.b (h) b.c.b

36

SmartKeys in XATTablesSmartKeys in XATTables

Agg $col5

$col5

<book> <book>

<title>TCP/IP… <title>Data …

</title>… </title> ..

</book> </book>

$col5

<book>

<title>TCP/IP …</title>

<price>55.48</price>

</book>

<book>

<title>Data … </title>

<price>39.95</price>

</book>

)

(

,

Agg $col5

$col5

tbb(b.f.cm..b.f.l) tbc(b.d.f..b.d.b)

$col2 $col4 $col5

b.f.cm b.f.l tbb

b.d.f b.d.b tbc

{ },

11 22

1

2

12

37

The Impact of The Impact of SmartKeys on SmartKeys on

View MaintenanceView Maintenance

38

Order Among XAT Tuples during View MaintenanceOrder Among XAT Tuples during View Maintenance

Not touching other tuples in XAT table

No reordering ever needed.

Gaining distributiveness in regard to bag union on tuple level

$col1, price $col3

$b $col3

b.b b.b.b

b.f b.f.cm

b.d b.d.b

$b

b.b

b.f

b.d

3

1

2

3

1

2

39

Order in a Sequence during View MaintenanceOrder in a Sequence during View Maintenance

Agg $col5

$col5

tb..b.f.l..b.f.cm tb..b.d.f..b.d.b

$col5

tb..b.f.l..b.f.cm

tb..b.d.f..b.d.b

Not touching other members of the sequence

No reordering ever needed.

Gaining distributiveness in regard to bag union on cell level

{ },

1

2

12

40

Update Propagation RulesUpdate Propagation Rules

Operator

XAT table 1

XAT table 2

Operator

Update to XAT table 1

Update to XAT table 2

Execution View Maintenance

time

Use distributiveness in regard to bag union

Reuse rules from relational for most SQL XAT operators

41

Update Propagation Rules ExampleUpdate Propagation Rules Example((Navigate Unnest on Insert Tuple)Navigate Unnest on Insert Tuple)

T2old = $col,path$col’ (T1old)

T1new=T1old + T1

T2new = $col,path$col’ (T1old + T1) =

= $col,path$col’ (T1old) + $col,path

$col’ (T1) =

= T2old + T2

+ represents bag union

T1

T2

$col,path$col’

T1

T2

Execution View Maintenance

time

$col,path$col’

42

Update Propagation Strategy Update Propagation Strategy

XML Source XML Source XML Source

XML ViewUpdate

XAT

xatup

keyup

TranslatorTranslator

xmlup

Update XQuery

Storage ManagerStorage Manager

43

Update Primitives Update Primitives (The Format of Delta)(The Format of Delta)

XML Update Primitives (xup) Insert (xmlFragment, path) Delete (path) InsertAtt (name, value, path) DeleteAtt (name, path) Replace (oldValue, newValue, path)

XML Key Update Primitives (keyup) Insert (el, path) Delete (path) Replace (el, pos)

XAT Update Primitives (xatup) InsertTuple (tuple) DeleteTuple (tupleId) ChangeTuple (Keyup, columnName, tupleId)

Apply to original XML Document

Express update on original XML data in

terms of LexKeys

Apply to XATTable

44

A Complete A Complete ExampleExample

45

S ”bib.xml” $S1

bib.xml

$S1, bib $col1

$col1, book $b

$b, price $col2

$b, title $col4

$col3 < 60

T <book>$col4 $col2</book> $col5

Agg $col5

Storage ManagerStorage Manager

bib

book bookbook

price title titleprice

title

b

b.b b.d b.f

b.b.cdb.b.b

b.d.f

b.f.cm b.f.l

bib.xml

Constructed XDOMs

{

tb..b.f.l..b.f.cm(b.f.l..b.f.cm )

}

$col5

tb..

b.f.l..

b.f.cm

XDOMKey

book

b.f.l b.f.cm

tr

$col6

tr

tb..

b.f.l..

b.f.cm

XDOMKey

book

b.f.l b.f.cm

result

tb..b.f.l.. b.f.cm

T <result>$col5</result> $col6

b

$col1b.f

b.d

b.b

$bb.f.cm

b.b.b

$col2

b.f

b.b

$b b.f.l

b.b.cd

$col4

b.f.cm

b.b.b

$col2b.f.l

$col4

b.f.cm

$ col2

tb..

b.f.l..

b.f.cm

XDOMKey

book

b.f.l b.f.cm

tb..b.f.l..b.f.cm

$col5

Execution

46

S ”bib.xml” $S1

bib.xml

$S1, bib $col1

$col1, book $b

$b, price $col2

$b, title $col4

$col3 < 60

T <book>$col4 $col2</book> $col5

Agg $col5

Storage ManagerStorage Manager

bib

book bookbook

price title titleprice

title

b

b.b b.d b.f

b.b.cdb.b.b

b.d.f

b.f.cm b.f.l

bib.xml

Constructed XDOMs

T <result>$col5</result> $col6

price

b.d.b

Insert (price, bib[1].book[2])

Insert (price[b.d.b],

bib[b].book[b.d])

b

$col1

ChangeTuple(insert(price[b.d.b], bib[b].book[b.d]), $col1, b)

b.f

b.d

b.b

$b

changeTuple(insert(price[b.d.b], book[b.d]), $b, b.d)

ChangeTuple(insert(price[b.d.b], bib[b].book[b.d]), $col2, b.f, b.f.m)

b.f.cm

b.b.b

$col2

b.f

b.b

$b

insertTuple({b.d, b,d.b})

b.f.l

b.b.cd

$col4

b.f.cm

b.b.b

$col2

insertTuple({b.d.b, b.d.f})

b.f.l

$col4

b.f.cm

$ col2

insetTuple({b.d.b, b.d.f})

tb..

b.f.l..

b.f.cm

XDOMKey

book

b.f.l b.f.cm

tb..b.f.l..b.f.cm

$col5

insertTuple({tb..b.d.f..b.d.b})

tr

$col6

tr

tb..

b.f.l..

b.f.cm

XDOMKey

book

b.f.l b.f.cm

result

tb..b.f.l.. b.f.cm

ChangeTuple(insert(tb..b.d.f..b.d.b, result[tr]), $col6, tr)

b.d.bb.d

b.f.cm

b.b.b

$col2

b.f

b.b

$b

b.d.fb.d.d

b.f.l

b.b.cd

$col4

b.f.cm

b.b.b

$col2 b.d.f

b.f.l

$col4

b.d.d

b.f.cm

$ col2

tb..

b.d.f..

b.d.b

tb..

b.f.l..

b.f.cm

XDOMKey

book

b.f.l b.f.cm

book

b.d.f b.d.b

tb..b.d.f..b.d.b

tb..b.f.l..b.f.cm

$col5

tb..

b.d.f..

b.d.b

tb..

b.f.l..

b.f.cm

XDOMKey

book

b.f.l b.f.cm

book

b.d.f b.d.b

{

tb..b.f.l..b.f.cm(b.f.l..b.f.cm )

tb..b.d.f..b.d.b(..b.d.f..b.d.b)

}

$col5

tb..b.d.f..b.d.b(..b.d.f..b.d.b)

{

tb..b.f.l..b.f.cm(b.f.l..b.f.cm )

}

$col5

ChangeTuple(insert( tb..b.d.f..b.d.b, null), $col5, )

tb..

b.d.f..

b.d.b

tb..

b.f.l..

b.f.cm

XDOMKey

book

b.f.l b.f.cm

book

b.d.f b.d.b

View Maintenance

47

OutlineOutline

Motivation Problem Description Background on XAT

XML Algebra Order in XML Algebra

The IVOX Approach Order Encoding Overall strategy

System Architecture Related Work Future Work

48

System ArchitectureSystem Architecture

Process

Data

Legend

XML Query Engine

XMLSource

XML Algebra

Tree

Materialized Auxiliary Views

Materialized XML View

XMLSource

Persistent Data Storage

One time occurrence

On-update occurrence

XML View

Maintainer

VM Initializer

View Definition XQuery

RainbowRainbow

User

Update XQuery

Update Propagation

RulesRepository

XMLSource

Update Primitive

Generator

Executer

XTUPXTUP

Storage ManagerStorage Manager

ExecutionView Maintenance

49

OutlineOutline

Motivation Problem Description Background on XAT

XML Algebra Order in XML Algebra

The IVOX Approach Order Encoding Overall strategy

System Architecture Related Work Future Work

50

Related WorkRelated Work

A.Gupta, I.S.Mumick. Maintenance of Materialized Views: Problems, Techniques, and Application. In Bulletin of the Technical Committee on Data engineering 1995.

T. Grin, L.Libkin. Incremental maintenance of views with duplicates. In SIGMOD 1995.

H. Liefke and S. Davidson. View Maintenance for Hierarchical Semistructured Data. In DAWAK 2000.

S. Abiteboul, J. McHugh, Rys, Vassalos, J. Wiener. Incremental Maintenance for Materialized Views over Semistructured Data. In VLDB 1998.

51

OutlineOutline

Motivation Problem Description Background on XAT

XML Algebra Order in XML Algebra

The IVOX Approach Order Encoding Overall strategy

System Architecture Related Work Future Work

52

Future WorkFuture Work

Near Future … Launch the system Batch update coming Experiments and Evaluation

Compare the system’s performance to recomputation

… and Beyond Batching updates coming from

different sources Integrity constraints Algebra tree rewrite rules