1 introduction to xml algebra based on talk prepared for cs561 by wan liu and bintou kane

36
1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

Post on 22-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

1

Introduction to XML Algebra

Based on talk prepared for CS561 by Wan Liu and Bintou Kane

Page 2: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

2

Data Model data model ~ core data structures

and data types supported by DBMS relational database is a table (set-

oriented) data model XML format is a tree-structured

hierarchical model

Page 3: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

3

Why XML Algebra?

It is common to translate a query language into an algebra.

First, the algebra is used to give a semantics for the query language.

Second, the algebra is used to support query optimization.

Page 4: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

5

NIAGARA Title : Following the paths of XML

Data: An algebraic framework for XML query evaluation

By : Leonidas Galanis, Efstratios Viglas, David J. DeWitt, Jeffrey. F. Naughton, and David Maier.

Univ. of Wisconsin

Page 5: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

6

Outline

Concepts of Niagara Algebra

Operations

Optimization

Page 6: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

7

Goals of Niagara Algebra

Be independent of schema information Query on both structure and content Generate simple, flexible, yet powerful

algebraic expressions Allow re-use of traditional optimization

techniques

Page 7: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

8

Example: XML Source Documents

Invoice.xml

<Invoice_Document>

<invoice No = 1>

<account_number>2 </account_number>

<carrier>AT&T</carrier>

<total>$0.25</total>

</invoice>

<invoice>

<account_number>1 </account_number>

<carrier>Sprint</carrier>

<total>$1.20</total>

</invoice>

<invoice>

<account_number>1 </account_number>

<carrier>AT&T</carrier>

<total>$0.75</total>

</invoice>

</Invoice_Document>

Customer.xml

<Customer_Document>

<customer>

<account>1 </account>

<name>Tom </name>

</customer >

<customer>

<account>2 </account>

<name>George </name>

</customer >

</Customer _Document>

Page 8: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

9

XML Data Model and Tree Graph

Example:Invoice_Document

Invoice Invoice…

numbercarrier total number

carriertotal

2 AT&T $0.25 1 Sprint $1.20

<Invoice_Document> <invoice> <number>2</number> <carrier>Sprint</carrier> <total>$0.25</total> </invoice>

<invoice><number>1</number> <carrier>Sprint</carrier> <total>$1.20</total> </invoice>

</Invoice_Document>

Ordered Tree Graph,

Semi structured Data

Page 9: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

10

XML Data Model [GVDNM01]

Collection of bags of vertices. Vertices in a bag have no order. Example:

Root invoice.xml invoice invoice.account_number

<invoice>Invoice-element-content

</invoice>

< account_number >element-content

</ account_number >

[Root“invoice.xml”, invoice, invoice. account_number ]

Page 10: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

11

Data Model

Bag elements are reachable by path expressions.

Path expression consists of two parts: An entry point A relative forward part

Example: account_number:invoice

Page 11: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

12

Operators

Source S , Follow , Select , Join , Rename , Expose , Vertex , Group , Union , Intersection , Difference - , Cartesian Product .

Page 12: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

13

Source Operator S

Input : a list of documents Output :a collection of singleton bags

Examples :

S (*) All Known XML documentsS (invoice*.xml) All XML documents whose filename match “invoice*.xmlS (*,schema.dtd) All known XML documents that conform to schema.dtd

Page 13: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

14

Follow operator Input : a path expression in entry

point notation Functionality : extracts vertices

reachable by path expression Output : a new bag that consists of

the extracted vertex + all contents of original bag (in case of unnesting follow)

Page 14: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

15

Follow operator (Example*)

Root invoice.xml invoice

<invoice>Invoice-element-content

</invoice>

Root invoice.xml invoice invoice.carrier

<invoice>Invoice-element-content

</invoice>

<carrier>carrier -element-content

</carrier >

(carrier:invoice)*Unnesting Follow

{[Root invoice.xml , invoice]}

{[Root invoice.xml , invoice, invoice.carrier]}

Page 15: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

16

Select operator

Input : a set of bags Functionality : filters the bags of a

collection using a predicate Output : a set of bags that conform

to the predicate Predicate : Logical operator (,,), or simple

qualifications (,,,,,)

Page 16: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

17

Select operator (Example)

invoice.carrier =Sprint

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

{[Root invoice.xml , invoice], [Root invoice.xml , invoice], ……………}

{[Root invoice.xml , invoice],… }

Page 17: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

18

Join operator Input: two collections of bags Functionality: Joins the two

collections based on a predicate Output: the concatenation of pairs of

pages that satisfy the predicate

Page 18: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

19

Join operator (Example)

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

Root customer.xml customer<customer>

customer-element-content</customer>

account_number: invoice =number:customer

Root invoice.xml invoice Root customer.xml customer<invoice>

Invoice-element-content</invoice>

<customer>customer-element-content

</customer>

{[Root invoice.xml , invoice]} {[Root customer.xml , customer]}

{[Root invoice.xml , invoice, Root customer.xml , customer]}

Page 19: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

20

Expose operator

Input: a list of path expressions of vertices to be exposed

Output: a set of bags that contains vertices in the parameter list with the same order

Page 20: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

21

Expose operator (Example)

Root invoice.xml invoice. bill_period invoice.carrier

<invoice>carrier-element-content

</invoice>

<carrier>bill_period -element-content

</carrier >

(bill_period,carrier)

{[Root invoice.xml , invoice.bill_period, invoice.carrier]}

Root invoice.xml invoice invoice.carrier invoice.bill_period

<invoice>Invoice-element-content

</invoice>

<carrier>bill_period -element-content

</carrier >

{[Root invoice.xml , invoice, invoice.carrier, invoice.bill_period]}

<invoice>carrier-element-content

</invoice>

Page 21: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

22

Vertex operator

Creates the actual XML vertex that will encompass everything created by an expose operator

Example :

(Customer_invoice)[((account)[invoice.account_number], (inv_total)[invoice.total])]

Page 22: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

23

Other operators Group : is used for arbitrary

grouping of elements based on their values Aggregate functions can be used with

the group operator (i.e. average) Rename : Changes entry point

annotation of elements of a bag. Example: (invoice.bill_period,date)

Page 23: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

24

Example: XML Source Documents

Invoice.xml

<Invoice_Document>

<invoice>

<account_number>2 </account_number>

<carrier>AT&T</carrier>

<total>$0.25</total>

</invoice>

<invoice>

<account_number>1 </account_number>

<carrier>Sprint</carrier>

<total>$1.20</total>

</invoice>

<invoice>

<account_number>1 </account_number>

<total>$0.75</total>

</invoice>

<auditor> maria </auditor>

</Invoice_Document>

Customer.xml

<Customer_Document>

<customer>

<account>1 </account>

<name>Tom </name>

</customer >

<customer>

<account>2 </account>

<name>George </name>

</customer >

</Customer _Document>

Page 24: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

25

Xquery ExampleList account number, customer name, and

invoice total for all invoices that has carrier = “Sprint”.

FOR $i in (invoices.xml)//invoice,

$c in (customers.xml)//customer

WHERE $i/carrier = “Sprint” and

$i/account_number= $c/account

RETURN

<Sprint_invoices>

$i/account_number,

$c/name,

$i/total

</Sprint_invoices>

Page 25: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

26

Example: Xquery output

<Sprint_Invoice>

<account_number>1 </account_number>

<name>Tom </name>

<total>$1.20</total>

</Sprint_Invoice >

Page 26: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

27

Algebra Tree Execution

customer (2) customer(1) Invoice (1) invoice (2) invoice (3)

Source (Invoices.xml) Source (cutomers.xml)

Follow (*.invoice) Follow (*.customer)

Select (carrier= “Sprint” )

invoice (2)

Join (*.invoice.account_number=*.customer.account)

invoice(2) customer(1)

Expose (*.account_number , *.name, *.total )

Account_number name total

Page 27: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

28

Optimization with Niagara

Optimizer based on Niagara algebra:

Use the operation more efficiently Produce simpler expressions by

combining operations

Page 28: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

29

Language Convention A and B are path expressions A< B -- Path Expression A is

prefix of B AnB --- Common prefix of path

A and B AńB --- Greatest common of

path A and B ┴ --- Null path Expression

Page 29: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

30

Heuristics using Rewrite Rules

Allow optimization based on path selectivity

When applying un-nesting following operation Φμ

Page 30: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

31

Φμ(A) [Φμ(B)]=Φμ (B)[Φμ (A)]

TRUE when exists C such that C < A && C < B and C = AńB

Or AnB = ┴

Interchangeability of Follow operation

Page 31: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

32

Application of Rule on Invoice

Φμ(acc_Num:invoice)[Φμ(carrier:invoice)] *

=?=Φμ(carrier:invoice)[Φμ(acc_Num:invoice)] **

Page 32: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

33

Application of Rule on Invoice

Φμ(acc_Num:invoice)[Φμ(carrier:invoice)]

?=Φμ(carrier:invoice)[Φμ(acc_Num:invoice)]

Equivalent because both share the common prefix “invoice”.

Case AńB = invoice

Page 33: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

34

Benefit of Rule Application NOTE: let us assume that acc_Num is required for each invoice

element, while carrier is not required for invoice element

THEN:Φμ(acc_Num:invoice)[Φμ(carrier:invoice)]

?=Φμ(carrier:invoice)[Φμ(acc_Num:invoice)]

Then what algebra tree do we prefer?

Φμ(acc_Num:invoice)[Φμ(acc_Num:customer)]

make more sense than ** Why?

Page 34: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

35

Discussion

Reduction of Input Size on firstSub-operation:

Φμ(carrier:invoice)

Page 35: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

36

Should we/can we apply the rule below?

Φμ(acc_Num:invoice)[Φμ(acc_Num:Customer)]

Page 36: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane

37

“acc_Num:invoice” and“acc_Num:customer” are two totally different paths

Case is: AnB = ┴

So yes, rule is valid.