covering indexes for branching path queries

Post on 05-Feb-2016

26 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Covering Indexes for Branching Path Queries. Raghav Kaushik , Philip Bohannon, Jeffrey F Naughton and Henry F Korth. XML as Graph Data. Leaf nodes with attributes are suppressed. oid. label(3). Non-tree edges: model IDREF relationships in the document. Branching Path Expression. - PowerPoint PPT Presentation

TRANSCRIPT

Covering Indexes for Branching Path Queries

Raghav Kaushik, Philip Bohannon, Jeffrey F Naughton and Henry F Korth

1Abdullah Mueen

XML as Graph Data

Abdullah Mueen 2

oid

label(3)

Non-tree edges: model IDREF relationships in the document

Leaf nodes with attributes are suppressed

Branching Path Expression

Abdullah Mueen 3

ROOT/metro/neighborhoods/neighborhood[/business=>cinema-hall]/cultural=>museum

Example (1)

Abdullah Mueen 4

//hotel[/star][<=business\neighborhood[/cultural=>museum[\art]]]

Covering Index

• A covering index can answer any query from a set of queries without consulting with the original document.

• The GOAL of this paper is to find a covering index for “Branching Path Queries” .

Abdullah Mueen 5

k-bisimilarity

Abdullah Mueen 6

8

4

1 2

5

7

3

6

9

0 R

D

CBA

D

DC

C

B

Two nodes u and v are called k-bisimilar (u ≈k v) if

1.label(u) = label(v) 2.every incoming label path of length≤k to u matches with at least one incoming path of length≤k to v and vice versa.

2,4 are 0-bisimilar. 5,7 are 1-bisimilar 8,9 are 2-bisimilar 6,8 are 1-bisimilar

≈k defines an equivalence class over the set of nodes in G

The algorithm for computing k-bisimulation will be shown later

1-index : Covering Index for Simple Path Expression

Abdullah Mueen 7Abdullah Mueen 7

8

4

1 2

5

7

3

6

9

0 R

D

CBA

D

DC

C

B

18

15

12

13

16

14

17

11

R

D

CB

A

DCB

{8,9}

{1}{2}

{4} {5}

{3}

{6}

19

C

{7}

18

15

12

13

16

14

17

11

R

D

CB

A

DCB

{8}

{1}{2}

{4} {5}

{3}

{6}

19

C

{7}

18

D{9}

12

13

14

15

11

R

CB

A

D

{1} {2,4}{3,5,7}

{6,8,9}

A(0) A(1)

A(2) A(3) = 1-index

data graph G

15

12

13

16

14

17

11

R

CBA

DCB

{1}{2}

{4} {5,7}

{3}

{6,8,9}

SuccStable

SuccStable

SuccStable

Inverse edges

Abdullah Mueen 8

8

4

1 2

5

7

3

6

9

0 R

D

CBA

D

DC

C

B

8

4

1 2

5

7

3

6

9

0 R

D

CBA

D

DC

C

B

5,7 are not 1-bisimilar 5,7 are 1-bisimilar

The F&B index

Abdullah Mueen 9

• While there is no change– Reverse all edges– Compute Forward Bismilarity Partition– Reverse all edges again.– Compute Backward Bisimilarity Partition

Forward Bisimulation

Abdullah Mueen 10

8

4

1 2

5

7

3

6

9

0 R

D

CBA

D

DC

C

B

8

4

1 2

5

7

3

6

9

0 R

D

CBA

D

DC

C

B

8

4

1 2

5

7

3

6

9

0 R

D

CBA

D

DC

C

B

8

4

1 2

5

7

3

6

9

0 R

D

CBA

D

DC

C

B

Backword Bisimulation

Abdullah Mueen 11

8

4

1 2

5

7

3

6

9

0 R

D

CBA

D

DC

C

B

8

4

1 2

5

7

3

6

9

0 R

D

CBA

D

DC

C

B

8

4

1 2

5

7

3

6

9

0 R

D

CBA

D

DC

C

B

Properties of F&B index

• The F&B index over a data graph G covers all branching path expression.

• F&B index is the smallest of the indexes that covers branching path queries.

• Generally F&B is large for most of the real documents.

Abdullah Mueen 12

1. Tags to be indexed

• There are tags that are not used for Queries. • bold, emph• We specify set of tags to be indexed.• In a 100MB document, the F&B index on all

tags has 436,000 nodes while ignoring formatting tags it has 18,000 nodes.

Abdullah Mueen 13

2. IDREF edges to be indexed

• IDREF edges are not counted in // operation.• IDREF edges are explicitly described in the

path expression by => operator.• We specify the Set of IDREF edges to be

indexed.• The 100MB document has 1.3 million nodes

with all IDREF edges while it has 18,000 nodes without any IDREF edges and formatting tags.

Abdullah Mueen 14

3. Exploiting Local Similarity

• Long Queries are not frequent and interesting.• If we restrict the length of the possible

queries, we can get much smaller index tree than the F&B index.

• We specify the length of the local path by using k-bisimilarity instead of bisimilarity while computing the F&B index.

Abdullah Mueen 15

4. Restricting Tree Depth

• Long nested conditions are less likely to occur.• We specify the maximum depth of the

conditional path expression by tree-depth (defined next).

Abdullah Mueen 16

tree depth

Abdullah Mueen 17

//museums/history/museum[/featured and <=cultural\neighborhood[/cultural=>museum[\art]]]

Definition of an Index

• A set of tags T• Set of IDREF edges on both directions reffwd

and refbwd

• Two parameters kbwd and kfwd to restrict the length of the path queries

• One parameter td to restrict the depth of the nested conditional expression.

Abdullah Mueen 18

The BPCI index

Abdullah Mueen 19

• Remove all tags not in T such that the removal does not cut out a tag in T.

• Start with label grouping as current partition P• For i=0 and i≤td

– Reverse all edges in G, retain IDREF edges only in reffwd .

– P ← Forward kfwd -Bismilar Partition of P and inc(i)

– Reverse all edges in G again, retain IDREF edges only in refbwd .

– P ← Backward kbwd-Bisimilar Partition of P and inc(i)

Variations of BPCI

Abdullah Mueen 20

Testing if an Index covers a Query

• Build the Query graph• Check if all tags and IDREF edges in the query

are in T and in (refbwd U reffwd)

• Check if the tree depth of the query is less than td of the index

• Check if all paths in the query with even tree depth have length < kbwd

• Check if all paths in the query with odd tree depth have length < kfwdAbdullah Mueen 21

Result on Xmark benchmark

Abdullah Mueen 22

1. Iall is the F&B index2. Iallmost-all is F&B with kfwd = 13. Ispecific is built on the query

Result

Abdullah Mueen 23

Conlclusion

• BPCI is the covering index for Branching Path Queries.

• By setting appropriate parameters, we can get a wide range of queries suitable for various applications.

• Extensions– Updating and Bulk loading– Integration with value indexes

Abdullah Mueen 24

top related