1 rainbow xml-query processing revisited: the incomplete story (part ii) xin zhang

Post on 22-Dec-2015

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II)

Xin Zhang

2

Outline XAT Decorrelation. Optimization

XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting.

Conclusion & Future Works.

3

XAT Decorrelation XQuery is Correlated Query Decorrelation is required for

Optimization XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting.

4

Three kinds of Decorrelation Simple Decorrelation

No Additional sources No Aggregate Functions

Complex Decorrelation with Additional Sources

Complex Decorrelation with Aggregate Functions

5

<!ELEMENT prices (book*)> <!ELEMENT book (title, price)> <!ELEMENT title (#PCDATA)> <!ELEMENT source (#PCDATA)> <!ELEMENT price (#PCDATA)>

<prices> <book>

<title> TCP/IP Illustrated </title> <price>65.95</price>

</book> <book>

<title> TCP/IP Illustrated </title> <price>65.95</price>

</book> <book>

<title>Data on the Web</title> <price>34.95</price>

</book> <book>

<title>Data on the Web</title> <price>39.95</price>

</book> </prices>

Example* of XML Use Cases.

6

Simple Query Example

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

FOR($t)

Agg()

T (<minprice>[$t]</minprice>):col1

<results> {

for $t in distinct (document("prices.xml") /book/title) return

<minprice> $t

</minprice> }

</results>

In the document "prices.xml", find the book title.

7

Simple DecorrelationLinear the Tree: T[FOR(CB, T2[])[T1[S1]]]

T[T2[T1[S1]]]

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

FOR($t)

Agg()

T (<minprice>[$t]</minprice>):col1

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

Agg()

T (<minprice>[$t]</minprice>):col1

8

Is Simple Decorrelation Right? Every operator, except Groupby,

has the semantic of “for each” tuple in the input table.

Hence, the FOR operator can be omitted in the simple decorrelation scenario.

9

Two types of Navigates Navigate Unnesting: U

Unnesting the parent-children relationship, and duplicates the parent values for each child.

Navigate Collection: C

Nesting the parent-children relationship, create a collection of children, but keep the single parent.

10

Where to use two types Navigate Unnesting: U

FOR binding. Navigate Collection: C

LET binding.

11

Complex Query Example

c($b, price):col4

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

FOR($t)

Agg()

T (<minprice> [$t], [col4]</minprice>):col1

<results> {

for $t in distinct (document("prices.xml") /book/title),let $b := document(“prices.xml") /book [title = $t]return

<minprice> $t, $b/price

</minprice> }

</results>

In the document "prices.xml", find the book title and its prices.

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

c($b, title):col3

12

Complex Decorrelation with Additional Source

: T[FOR(CB, T2[S2])[T1[S1]]] T[T2[[T1[S1],S2]]]

c($b, price):col4

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

FOR($t)

Agg()T (<minprice> [$t], [col4]</minprice>):col1

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

c($b, title):col3

C($b, price):col4

T (<minprice> [$t], [col4]</minprice>):col1

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

C($b, title):col3

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

Agg()

13

Full Query Example

c($b, price/text()):col4

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

FOR($t)

Agg()

T (<minprice> [$t], <price>[col5]</price></minprice>):col1

<results> {

for $t in distinct (document("prices.xml") /book/title),let $b := document(“prices.xml") /book [title = $t]return

<minprice> $t, <price>min($b/price/text())</price>

</minprice> }

</results>

In the document "prices.xml", find the minimum price for each book, in the form of a "minprice" element.

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

c($b, title):col3

min(col4):col5

14

Complex Query Decorrelation with one Aggregation Function

T[FOR(CB, T2[Agg(T3[])])[T1[S1]]] T[(DM(T1))[T1,T2[(DM(T1),Agg(T3[[Distinct(T1[S1]),

S2]))]]]

DM(T1) is data model computed from T1.

S2

Agg()

T1

S1

T3

FOR($rate)

T2

T

S1

Groupby(DM(T1), Agg())

S2

T3

TT2

T1

Distinct

15

The Query after Decorrelation

c($b, price/text()):col4

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

FOR($t)

Agg()

T (<minprice> [$t], <price>[col5]</price></minprice>):col1

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

c($b, title):col3

min(col4):col5

C($b, price/text()):col4

T (<minprice> [$t], [col4]</minprice>):col1

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

C($b, title):col3

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

Agg()

GB(DM, min(col4):col5)

16

Where are we? XAT Decorrelation. Optimization

XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting.

Conclusion & Future Works.

17

XAT Computation Pushdown To push the execution into

relational database Steps:

Push Navigation down. Cancel out Navigation and Tagger. Generating SQL stmt.

18

Navigation Pushdown Basically Navigation can push through

all the operators until: Has dependency on its child operator.

Example Rewriting rules: (x1, path):x2[(y1, path):y2[T]] (y1,

path):y2[(x1, path):x2[T]] (x1 != y2) (x1, path):x2[(c) [T]] (c) [(x1, path):x2[T]] (x1, path):x2[[T1, T2]] [T1, (x1, path):x2[T2]]

(if x1 in DM(T2)) (x1, path):x2[[T1, T2]] [(x1, path):x2[T1], T2]

(if x1 in DM(T1))

19

Navigation Pushdown Example

C($b, price/text()):col4

T (<minprice> [$t], [col4]</minprice>):col1

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

C($b, title):col3

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

Agg()

GB(DM, min(col4):col5)

C($b, price/text()):col4

T (<minprice> [$t], [col4]</minprice>):col1

S(“prices.xml”):R2

C(R2, /book):$b

(col3=$t)

C($b, title):col3

T(<results>[col1]</results>):col0

distinct(col2):$t

S(“prices.xml”):R1

(R1, /book/title):col2

Agg()

GB(DM, min(col4):col5)

20

Navigation/Tagger Cancel Out Used to simplify a composite XAT

tree. Transformation Rules:

(x, /):y[T(<tag>[z]</tag>):x[s]] s Note: Also use type analysis for the

cancel out.

21

View Query Example<DB>

<book> <row>

<title> TCP/IP Illustrated </title> <price>65.95</price>

</row> <row>

<title> TCP/IP Illustrated </title> <price>65.95</price>

</row> <row>

<title>Data on the Web</title> <price>34.95</price>

</row> <row>

<title>Data on the Web</title> <price>39.95</price>

</row> </book>

</prices>

<prices> {

for $row in distinct (DXV /book/row),return

<book> $row/title, $row/price

</book> }

</prices>

T(<prices>[col6]</prices>):col5

T(<book>[col7],[col8]</book>):col6

S(DXV):R3

(R3, /book/row):$row

Agg()

($row, title):col7

($row, price):col8

22

Cancel Out Example (1)

C($b, price/text()):col4

S(“prices.xml”):R2

C(R2, /book):$b

C($b, title):col3

...

T(<prices>[col6]</prices>):col5

T(<book>[col7],[col8]</book>):col6

S(DXV):R3

(R3, /book/row):$row

Agg()

($row, title):col7

($row, price):col8

C($b, price/text()):col4

C(R2, /book):$b

C($b, title):col3

...

T(<prices>[col6]</prices>):R2

T(<book>[col7],[col8]</book>):col6

S(DXV):R3

(R3, /book/row):$row

Agg()

($row, title):col7

($row, price):col8

(x, y)[op():x[s]] op():y[s]

23

Cancel Out Example (2)

C($b, price/text()):col4

C(R2, /book):$b

C($b, title):col3

...

T(<prices>[col6]</prices>):R2

T(<book>[col7],[col8]</book>):col6

S(DXV):R3

(R3, /book/row):$row

Agg()

($row, title):col7

($row, price):col8

C($b, price/text()):col4

C($b, title):col3

...

T(<book>[col7],[col8]</book>):$b

S(DXV):R3

(R3, /book/row):$row

($row, title):col7

($row, price):col8

24

Cancel Out Example (3)

C($b, price/text()):col4

C($b, title):col3

...

T(<book>[col7],[col8]</book>):$b

S(DXV):R3

(R3, /book/row):$row

($row, title):col7

($row, price):col8

C($b, price/text()):col4

...

T(<book>[col7],[col8]</book>):$b

S(DXV):R3

(R3, /book/row):$row

($row, title):col3

($row, price):col8

25

Cancel Out Example (4)

C($b, price):temp1

...

T(<book>[col7],[col8]</book>):$b

S(DXV):R3

(R3, /book/row):$row

($row, title):col3

($row, price):col8

C(temp1, text()):col4

...

S(DXV):R3

(R3, /book/row):$row

($row, title):col3

($row, price):temp1

C(temp1, text()):col4

26

SQL Generation Find a pattern in the XAT Translate that pattern into a SQL

operator that will access the relational database.

27

SQL Generation Example...

S(DXV):R3

(R3, /book/row):$row

($row, title):col3

($row, price):temp1

C(temp1, text()):col4

...

SQL(select title as col3,

price as temp1 from book):{col3,temp}

C(temp1, text()):col4

28

Where are we? XAT Decorrelation. Optimization

XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting.

Conclusion & Future Works.

29

XAT Data Model Cleanup By Default Each operator will append one

additional columns to the data model. Used to Help:

Execute: used to optimize the data storage during the execution

Cutting: get rid of the un-used operators in the XQuery

Equations for Data Model Cleanup Only keep the columns required by ancestors. DM := (DMp – Pp) Cp (P – C)

30

Data Model Examplefor $b in document("prices.xml") /booklet $prices := $b/pricereturn

$b

S(“prices.xml”):R1

(R1, /book):$b

Agg()

($b,):col1

C($b, price):$prices

1

2

3

4

5

Node

Produce Consume

DM before DM after

1 {} {} {$prices, R1, $b, col1}

{}

2 {col1} {$b} {$prices, R1, $b, col1}

{col1}

3 {$prices}

{$b} {$prices, R1, $b}

{$b, $prices}

4 {$b} {R1} {R1, $b} {$b}

5 {R1} {} {R1} {R1}

DM := (DMp – Pp) Cp (P – C)

31

Where are we? XAT Decorrelation. Optimization

XAT Computation Pushdown. XAT Data Model Cleanup. XAT Cutting.

Conclusion & Future Works.

32

XAT Cutting General Idea:

Get rid of the operators that’s produce useless data.

Equations: R := (Rp – P) C (P M) (Rp Mp) = NULL

33

XAT Cutting Example

R := (Rp – P) C

(P M) (Rp Mp)= NULL

for $b in document("prices.xml") /booklet $prices := $b/pricereturn

$b

S(“prices.xml”):R1

(R1, /book):$b

Agg()

($b,):col1

C($b, price):$prices

1

2

3

4

5

Node

Produce Consume

Modified

Required

Cut?

1 {} {} {*} {} N/A

2 {col1} {$b} {} {$b} {col1}

3 {$prices}

{$b} {} {$b} {}

4 {$b} {R1} {} {R1} {$b}

5 {R1} {} {} {} {R1}

34

Conclusions XQuery are heavily correlated,

hence need to be decorrelated for better optimization.

After Decorrelation, more optimization techniques can be applied: Computation Pushdown. Data Model Cleanup. Cutting.

35

Future Works Write TR to formalize the XAT.

Compare with ORDB, ODB, also XQA operators. Wrap Up:

Finalize uncertain operators deal with collections Union, Navigate

Formalize the Pushdown Rewriting Rules by Type (Reg. Exp. Type) Analysis

Finalize the XAT Rewriting Rules for: Order Handling Update propagation.

Translation from XAT back to Query Next Step:

Generate Search Space and Optimization Algorithm for XAT, ready for Schema Generation.

top related