incremental maintenance of xml structural indexes ke yi 1, hao he 1, ioana stanoi 2 and jun yang 1 1...

26
Incremental Maintenance of XML Structural Indexes Ke Yi 1 , Hao He 1 , Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2 IBM T. J. Watson Research Center

Upload: isabell-chesshir

Post on 14-Dec-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

Incremental Maintenance of XML Structural Indexes

Ke Yi1, Hao He1, Ioana Stanoi2 and Jun Yang1

1Department of Computer Science, Duke University2IBM T. J. Watson Research Center

Page 2: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

Motivation

XML is gaining tremendously in popularity in recent years

Used to represent many kinds of data Major DB vendors are rushing to incorporate

solutions for native XML repositories and retrieval IBM DB2, Oracle , Microsoft SQL Server Tamino, Natix, X-Hive, …

Page 3: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

Overview1

paper

4 section

5 title 6 algorithm

“1-index”7proof

8 section

9 title 10

“A(k)-index” 11proof

12

uses

algorithm

13 section

14

“experiments”

15 16

1718

aboutabout

title2

section

3title

“intro”

exp

exp

Page 4: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

Label Path Expressions1

paper

4 section

5 title 6 algorithm

“1-index”7proof

8 section

9 title 10

“A(k)-index” 11proof

12

uses

algorithm

13 section

14

“experiments”

15 16

1718

aboutabout

title2

section

3title

“intro”

exp

exp

/paper/section/algorithm

Page 5: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

Structural Indexes

Why do we need them? Speedup the evaluation of path expressions Provides a structural summary of the data graph

Structural indexes DataGuide [Goldman & Widom 97] 1-index [Milo & Suciu 99] A(k)-index [Kaushik et al. 02], D(k)-index [Qun et al. 03],

M(k)-index [He & Yang 04] Integration of structural indexes and inverted lists

[Kaushik et al. 04] Focus on maintenance

Has a major effect on index efficiency Remains an overlooked issue

Page 6: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

Outline1

paper

4 section

5 title 6 algorithm

“1-index”7proof

8 section

9 title 10

“A(k)-index” 11proof

12

uses

algorithm

13 section

14

“experiments”

15 16

1718

aboutabout

title2

section

3title

“intro”

exp

exp

Page 7: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

1-Index: Definition

Constructed by using bisimilarity Definition based on stability

Partition data nodes into index nodes dnode (v) and inode (I[v]) I[u] is v’s index parent if u is v’s parent An inode is stable if all of its dnodes have the

same index parents In a 1-index, all inodes are stable

vI[v]

u I[u]

Page 8: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

1-Index: Example

1

paper

2,4,8,13section

3,5,9,14

title

6,10algorithm

7proof 11

proof12

uses

15,16

17,18about

exp

1-index

1

paper

4 section

5 title 6

algorithm

7proof

8section

9title

10

11proof

12

uses

algorithm

13 section14

15

16

17

18

about

about

title2

section

3

title

expexp

data graph

/paper/section/algorithm

Page 9: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

1-Index: Quality

Assigning dnodes that are bisimilar into different inodes does not affect

correctness, but does affect efficiency

The quality of an index# inodes

# inodes in the minimum 1-index

− 1 X 100%

1

paper

2,4,8,13section

3,5,9,14

title

6,10algorithm

7proof 11

proof12

uses

15,16

17,18about

exp

2,4 8,13

Ideal: quality = 0%

Page 10: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

Previous Results

Construction The PT algorithm [Paige & Tarjan 87], in time O(m log n)

m – # edges, n - # nodes

Edge changes The propagate algorithm [Kaushik et al. 02] Quality of the 1-index after update

No guarantee on the quality of the resulted index 3 ~ 5% after 500 edge insertions in experiments

Subgraph addition Index-reconstruction

Page 11: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

Edge Insertion: An Example (1)

R

A B

C1 C2 C3

D1 D2 D3

Data Graph

R

A B

C1, C2 C3

D3

1-Index

D1, D2

R

A B

C3

D3

Split 1

D1, D2

C1 C2

Page 12: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

Edge Insertion: An Example (2)

R

A B

C3

D3

Split 2

C1 C2

D1 D2

R

A B

C2, C3

D3

Merge 1

C1

D1 D2

R

A B

C2, C3

D2, D3

Merge 2

C1

D1

Indeed the minimum 1-indexfor the data graph after updateNot a coincidence!

Page 13: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

Minimum & Minimal Indexes

Minimum: with the smallest number of inodes Minimal: no two inodes can be merged

R

A1 A2

B2B1

R

A1 A2

B2B1

R

A1,A2

B1,B2

Data graph Minimum 1-index Minimal 1-index

Page 14: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

Quality Guarantee

Theorem: The split/merge algorithm always maintains a minimal 1-index

Lemma: For acyclic data graphs, there is a unique minimal 1-index The minimum 1-index is always maintained

For cyclic data graphs, there could be more than one minimal 1-index One of them is maintained

Page 15: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

Outline1

paper

4 section

5 title 6 algorithm

“1-index”7proof

8 section

9 title 10

“A(k)-index” 11proof

12

uses

algorithm

13 section

14

“experiments”

15 16

1718

aboutabout

title2

section

3title

“intro”

exp

exp

Page 16: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

A(k)-Index: Definition

k-bisimilarity Definition based on stability

A(0)-index: partition by label … A(k)-Index

An inode in A(k)-index is stable if all of its dnodes have the same index parents in A(k-1)-index

Only interested in paths of length ≤k Shown to be much smaller and more efficient than

1-index [Kaushik et al. 02] But, no efficient maintenance algorithms are known!

Page 17: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

A(k)-index: Example

R

A B

C3

C6

C1 C2

C4 C5

R

A B

C2,C3C1

C4 C5,C6

R

A B

C2,C3C1

C4,C5,C6

R

A B

C1,C2,C3C4,C5,C6

Data graph A(2) (=1-index) A(1) A(0)

Maintenance of A(i)-index requires the information in A(i-1)-index

Page 18: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

A(k)-index: Refinement Tree

R

A B

C3

C6

C1 C2

C4 C5

R

A B

C2,C3C1

C4 C5,C6

R

A B

C2,C3C1

C4,C5,C6

R

A B

C1,C2,C3C4,C5,C6

Data graph A(2) (=1-index) A(1) A(0)

Page 19: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

A(k)-index: Refinement Tree

R

A B

C3

C6

C1 C2

C4 C5

R

A B

CC

C C

R

A B

CC

C

R

A B

C

Data graph A(2) A(1) A(0)

0.5% ~ 13% additional storage

1. Reduce storage cost2. Reduce maintenance cost

Page 20: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

Quality Guarantee

Theorem: The split/merge algorithm always maintains A(k)-index

Lemma: There is a unique minimal A(k)-index for any data graph, acyclic or cyclic

1-index A(k)-index

Acyclic minimum minimum

Cyclic minimal minimum

a minimalthe minimum

Page 21: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

Outline1

paper

4 section

5 title 6 algorithm

“1-index”7proof

8 section

9 title 10

“A(k)-index” 11proof

12

uses

algorithm

13 section

14

“experiments”

15 16

1718

aboutabout

title2

section

3title

“intro”

exp

exp

Page 22: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

Experiments on Edge Changes

Datasets Real-life: IMDB (272,000 nodes) Benchmark: XMark (198,000 nodes)

Setup First delete a portion of existing ID-REF links Then do random mixed insertions/deletions

Compare with 1-index: propagate (+ reconstruction) A(k)-index: recompute affected portion (+

reconstruction)

Page 23: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

Experiment Results: 1-index

Page 24: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

Experiment Results: A(k)-index

k speedup

2 1.35

3 6.15

4 16.6

5 15.3

running times

Page 25: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

Conclusions

The first solutions for the maintenance (edge & subgraph additions/deletions) of 1-index and A(k)-index that are both effective and efficient Effective: quality guarantee on the resulted index Efficient: the algorithms themselves are fast

Thank you!

Page 26: Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2

Graphical Illustrationsize

index

valid 1-index

split

merge

the index can only grow in size due to splitting, if merging is not enforced