1 rewriting nested xml queries using nested views nicola onose joint work with alin deutsch, yannis...

50
1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California, San Diego

Post on 19-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

1

Rewriting Nested XML Queries Using Nested Views

Nicola Onose

joint work with

Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola

University of California, San Diego

2

query result

The problem

• views defined by queries V1, …, Vn and materialized as docV1, …, docVn

the query Q

docVndocV1

V1 Vn

Can we answer Q using only view access paths?

Input XML data

INTRO

3

The problem

• views defined by queries V1, …, Vn and materialized as docV1, …, docVn

• is there a query R such that R(V1(Input) … Vn(Input)) = Q(Input)?

query result

the query Q

the rewritingquery R

docVndocV1

V1 Vn

Input XML data

INTRO

4

Motivation: caching & indexes

• caching: answer new queries using results of previously answered ones

• (partial) indexes: materialized references to frequently accessed parts of the data

materialized views, faster to access than the original input

query result

the query Q

the rewritingquery R

docVndocV1

V1 Vn

Input XML data

INTRO

5

query result

Motivation: security views

• checking existence of R security problem:allow only queries that can be expressed in terms of certain permitted queries, the security views

the query Q

the rewritingquery R

docVndocV1

V1 Vn

security views(permitted queries)Input XML data

INTRO

6

query result

Motivation: data integration

• data integration: given a query expressed in global terms, rewrite it using the descriptions of the particular sources

the query Q

the rewritingquery R

source1 sourcen

local/global mappings expressed as views

INTRO

Virtual global DB

7

Rewritings enabled by pattern matching

• Previous literature: find parts of the query that are precomputed by the views.

• How to decide that: match the patterns of the views into the query– In the relational case, patterns were: tableaux, conjunctive

queries– For XPath: tree patterns

• Matching XML queries?– (until recently) no pattern based description of XQuery

semantics– Nested XML Tableaux (NEXT) come to fill the gap

The NEXT Logical Framework for XQuery, A.Deutsch et al., VLDB’04

INTRO

8

Scope of Our Approach

• Nested XML Tableaux (NEXT) extend previous work on tree patterns.

• NEXT+ extends NEXT to the whole XQuery.

Tree Patterns cover XPath

NEXT extend TreePatterns with: - nested for-loops - joins - element construction etc.

NEXT+ extends NEXT to the whole XQuery language, including: - function calls - universal quantification - disjunction, negation etc.

INTRO

9

Scope of Our ApproachINTRO

Tree Patterns cover XPath

NEXT extend TreePatterns with: - nested for-loops - joins - element construction etc.

NEXT+ extends NEXT to the whole XQuery language, including: - function calls - universal quantification - disjunction, negation etc.

soundness guarantee:if a rewriting is found, it

is equivalent to the original query

completeness guarantee:if a rewriting exists, we will

find one

10

Query Q: group titles by author

for each distinct author, output the titles of his/her books

View V: group authors by title

for each book, output its title and the list of authors

Rewriting using views example

Rewriting R

scan the view and create an entry for each distinct author in the view output; add to it all the titles of the respective authorData on the

Web

bib.xml

book

titleauthor

The result of the view is cached and has faster access time than getting

the data directly from the source

INTRO

11

View V: group authors by titlefor $b1 in $doc//book, $t1 in $b1/titlereturn <authorlist> {$t1, $b1/author} </authorlist>

Rewriting using views example

Rewriting R

scan the view and create an entry for each distinct author in the view output; add to it all the titles of the respective author

INTRO

Previous work captures: - XPath navigation

Query Q: group titles by author

for each distinct author, output the titles of his/her books

12

View V: group authors by titlefor $b1 in $doc//book, $t1 in $b1/titlereturn <authorlist> {$t1, $b1/author} </authorlist>

Rewriting using views example

Query Q: group titles by authorfor $a in distinct-values($doc//book[title]/author)return <bibentry> { $a, for $b in $doc//book, $t in $b/title where some $a1 in $b/author satisfies $a1 eq $a return $t } </bibentry>

Previous work captures: - XPath navigation

NEXT captures: - XPath navigation

- nested for loops

- joins

- element construction etc.

INTRO

13

View V: group authors by titlefor $b1 in $doc//book, $t1 in $b1/titlereturn <authorlist> {$t1, $b1/author} </authorlist>

Rewriting using views example

Query Q: group titles by authorfor in distinct-values($doc//book[title]/author)return <bibentry> { $a, for $b in $doc//book, $t in $b/title where some in $b/author satisfies $a1 eq $a return $t } </bibentry>

INTRO

Previous work captures: - XPath navigation

NEXT captures: - XPath navigation

- nested for loops

- joins

- element construction etc.

$a1

$a

14

View V: group authors by titlefor $b1 in $doc//book, $t1 in $b1/titlereturn <authorlist> {$t1, $b1/author} </authorlist>

Rewriting using views example

Query Q: group titles by authorfor $a in distinct-values($doc//book[title]/author)return <bibentry> { $a, for $b in $doc//book, $t in $b/title where some $a1 in $b/author satisfies $a1 eq $a return $t } </bibentry>

INTRO

Previous work captures: - XPath navigation

NEXT captures: - XPath navigation

- nested for loops

- joins

- element construction etc.

15

Rewriting using views example

Data on the Web

bib.xml

book

titleauthor

Query Q: group titles by authorfor $a in distinct-values($doc//book[title]/author)return <bibentry> { $a, for $b in $doc//book, $t in $b/title where some $a1 in $b/author satisfies $a1 eq $a return $t } </bibentry> bound to

the root of the view output

INTRO

View V: group authors by titlefor $b1 in $doc//book, $t1 in $b1/titlereturn <authorlist> {$t1, $b1/author} </authorlist>

Rewriting Rfor $a3 in distinct-values($docV/authorlist[title]/author)return <bibentry> { $a3, for $p in $docV/authorlist, $t3 in $p/title where some $a4 in $p/author satisfies $a4 eq $a3 return $t3 } </bibentry>

navigate inside the

view output

16

Outline

• NEXT (NEsted XML Tableaux)• Rewriting Algorithm and Extensions• Experiments• Previous Work• Conclusions

17

Outline

• NEXT (NEsted XML Tableaux)• Rewriting Algorithm and Extensions• Experiments• Previous Work• Conclusions

18

Architecture of the NEXT framework

Nested XML Tableaux (NEXT)

Normalization

XQuery query and views

MinimizationRewriting

Using Views

Logical Optimization

Plan Execution Engine

Logical Plan

VLDB’04 presented at this

conference

NEXT

patterns

Nested XML Tableaux (NEXT)

Translate to XQuery

To Any XQuery Processor

19

The need for normalization

Nested XML Tableaux(NEXT)

Normalization

XQuery query and views

NEXT

for $a in distinct-values($doc//book[title]/author)return <bibentry> { $a, for $b in $doc//book, $t in $b/title where some $a1 in $b/author satisfies $a1 eq $a return $t } </bibentry>

20

Normalization into NEXT

Nested XML Tableaux(NEXT)

Normalization

XQuery query and views for $a in distinct-values($doc//book[title]/author)return <bibentry> { $a, for $b in $doc//book, $t in $b/title where some $a1 in $b/author satisfies $a1 eq $a return $t } </bibentry>

NEXT

for $a in distinct-values($doc//book[title]/author)return <bibentry> { $a, for $b in $doc//book, $a1 in $b/author, $t in $b/title where $a1 eq $a return $t } </bibentry>

21

Normalization into NEXT

Nested XML Tableaux(NEXT)

Normalization

XQuery query and views

NEXT

for $a in distinct-values($doc//book[title]/author)return <bibentry> { $a, for $b in $doc//book, $a1 in $b/author, $t in $b/title where $a1 eq $a groupby [$b], [$t] return $t } </bibentry>

for $a in distinct-values($doc//book[title]/author)return <bibentry> { $a, for $b in $doc//book, $t in $b/title where some $a1 in $b/author satisfies $a1 eq $a return $t } </bibentry>

cardinality?

NEXT

22

NEXT Patterns

book($b1)

title($t1)

book($b1)

author($a2)

<authorlist> $t1, B2(V)</authorlist>

$a2

B1(V)

[$a2]

[$b1],[$t1]

$doc

B2(V)

• alternative way of defining the XQuery semantics (but equivalent to the standard), given by matching patterns

View V:

• graphical representation of NEXT: nested patterns

NEXT

B1(V)B2(V)

forest of tree

patterns

for $b1 in $doc//book, $t1 in $b1/titlegroupby [$b1], [$t1]return <authorlist> {$t1, for $a2 in $b1/author groupby [$a2] return $a2 } </authorlist>

23

NEXT Patterns

• alternative way of defining the XQuery semantics (but equivalent to the standard), given by matching patterns

View V:

book($b1)

title($t1)

book($b1)

author($a2)

<authorlist> $t1, B2(V)</authorlist>

$a2

B1(V)

[$a2]

[$b1],[$t1]

$doc

B2(V)

• graphical representation of NEXT: nested patterns

NEXT

B1(V)B2(V)

descendant navigation

child navigation

for $b1 in $doc//book, $t1 in $b1/titlegroupby [$b1], [$t1]return <authorlist> {$t1, for $a2 in $b1/author groupby [$a2] return $a2 } </authorlist>

24

NEXT Patterns

book($b1)

title($t1)

book($b1)

author($a2)

<authorlist> $t1, B2(V)</authorlist>

$a2

B1(V)

[$a2]

[$b1],[$t1]

$doc

B2(V)

return function

• alternative way of defining the XQuery semantics (but equivalent to the standard), given by matching patterns

View V:

• graphical representation of NEXT: nested patterns

NEXT

B1(V)B2(V)

for $b1 in $doc//book, $t1 in $b1/titlegroupby [$b1], [$t1]return <authorlist> {$t1, for $a2 in $b1/author groupby [$a2] return $a2 } </authorlist>

25

NEXT Patterns

book($b1)

title($t1)

book($b1)

author($a2)

<authorlist> $t1, B2(V)</authorlist>

$a2

B1(V)

[$a2]

[$b1],[$t1]

$doc

B2(V)

list of groupby variable

s

• alternative way of defining the XQuery semantics (but equivalent to the standard), given by matching patterns

View V:

• graphical representation of NEXT: nested patterns

NEXT

B1(V)B2(V)

for $b1 in $doc//book, $t1 in $b1/titlegroupby [$b1], [$t1]return <authorlist> {$t1, for $a2 in $b1/author groupby [$a2] return $a2 } </authorlist>

26

NEXT Patterns

• alternative way of defining the XQuery semantics (but equivalent to the standard), given by matching patterns

book($b1)

title($t1)

book($b1)

author($a2)

$doc

book($b0)

title($t0)

Query Q:

author($a)

book($b)

title($t) author($a1)

<authorlist> $t1, B2(V)</authorlist>

$a2

B1(V)

[$a2]

[$b1],[$t1]

$doc

$doc <bibentry> $a, B2(Q)</bibentry>

$t

B1(Q)

$a

B2(Q)

[$b], [$t]

B2(V)

for $b0 in $doc//book, $t0 in $b0/title, $a in $b0/authorgroupby $areturn <bibentry> { $a, for $b in $doc//book, $a1 in $b/author, $t in

$b/title where $a1 eq $a groupby [$b],[$t] return $t } </bibentry>

NEXT

View V:

• graphical representation of NEXT: nested patterns

B1(V)B2(V)

B1(Q)B2(Q)

for $b1 in $doc//book, $t1 in $b1/titlegroupby [$b1], [$t1]return <authorlist> {$t1, for $a2 in $b1/author groupby [$a2] return $a2 } </authorlist>

27

NEXT Patterns

• alternative way of defining the XQuery semantics (but equivalent to the standard), given by matching patterns

book($b1)

title($t1)

book($b1)

author($a2)

$doc

book($b0)

title($t0) author($a)

book($b)

title($t) author($a1)

<authorlist> $t1, B2(V)</authorlist>

$a2

B1(V)

[$a2]

[$b1],[$t1]

$doc

$doc <bibentry> $a, B2(Q)</bibentry>

$t

B1(Q)

$a

B2(Q)

[$b], [$t]

B2(V)

NEXT

View V:

• graphical representation of NEXT: nested patterns

Query Q:for $b0 in $doc//book, $t0 in $b0/title, $a in $b0/authorgroupby $areturn <bibentry> { $a, for $b in $doc//book, $a1 in $b/author, $t in

$b/title where $a1 eq $a groupby [$b],[$t] return $t } </bibentry>

for $b1 in $doc//book, $t1 in $b1/titlegroupby [$b1], [$t1]return <authorlist> {$t1, for $a2 in $b1/author groupby [$a2] return $a2 } </authorlist>

28

Outline

• NEXT (NEsted XML Tableaux)• Rewriting Algorithm and Extensions• Experiments• Previous Work• Conclusions

29

Architecture of the NEXT framework

Nested XML Tableaux (NEXT)

Normalization

XQuery query and views

MinimizationRewriting

Using Views

Logical Optimization

Plan Execution Engine

Logical Plan

NEXT

Nested XML Tableaux (NEXT)

Translate to XQuery

Independent XQuery Processor

rewriting algorith

m

30

Overview of the Rewriting Algorithm

Input: query Q, views V1. detect alternative access paths towards the variable bindings

through the views

2. build a candidate rewriting R that uses only the access paths from phase 1.

3. check that R is equivalent to Q

REWRITING ALGORITHM

Query QAccess paths

through V

Access paths(candidate rewriting)

31

Step 1: Detect View Access Paths

• access paths: ways of accessing data using the view• identify matching subqueries

(extended tree pattern matching)• find a mapping and add navigation from the view return

book($b1)

title($t1)

book($b1)

author($a2)

$doc

book($b0)

title($t0) author($a)

book($b)

title($t) author($a1)

<authorlist> $t1, B2(V)</authorlist>

$a2

$doc

$doc

view query body

REWRITING ALGORITHM

32

Step 1: Detect View Access Paths

• access paths: ways of accessing data using the view• identify matching subqueries

(extended tree pattern matching)• find a mapping and add navigation from the view return

book($b1)

title($t1)

book($b1)

author($a2)

$doc

book($b0)

title($t0) author($a)

book($b)

title($t) author($a1)

<authorlist> $t1, B2(V)</authorlist>

$a2

$doc

$doc

view query body

$docV

authorlist($p0)

title($t2)

extended query

REWRITING ALGORITHM

33

Step 1: Detect View Access Paths

• access paths: ways of accessing data using the view• identify matching subqueries

(extended tree pattern matching)• find a mapping and add navigation from the view return• and another one…

book($b1)

title($t1)

book($b1)

author($a2)

$doc

book($b0)

title($t0) author($a)

book($b)

title($t) author($a1)

<authorlist> $t1, B2(V)</authorlist>

$a2

$doc

$doc

view query body

$docV

authorlist($p0)

extended query

author($a3)title($t2)

REWRITING ALGORITHM

34

Step 1: Detect View Access Paths

• access paths: ways of accessing data using the view• identify matching subqueries

(extended tree pattern matching)• find a mapping and add navigation from the view return• and another one…• computing all such mappings query extension that uses only view

access paths

book($b1)

title($t1)

book($b1)

author($a2)

$doc

book($b0)

title($t0) author($a)

book($b)

title($t) author($a1)

<authorlist> $t1, B2(V)</authorlist>

$a2

$doc

$doc

view query body

extended query

$docV

authorlist($p0)

title($t2) author($a3)

authorlist($p)

title($t3) author($a4)

$docV

query extension

REWRITING ALGORITHM

35

Step 2: Candidate Rewriting

• same return function as the initial query, but with other variable bindings

$doc

book($b0)

title($t0) author($a)

book($b)

title($t) author($a1)

$doc

original query

$docV

authorlist($p0)

title($t2) author($a3)

authorlist($p)

title($t3) author($a4)

$docV

extended query

<bibentry> $a, B2(Q)</bibentry>

$t

B1(Q)

$a

B2(Q)

[$b], [$t]

REWRITING ALGORITHM

36

Step 2: Candidate Rewriting

• same return function as the initial query, but with other variable bindings

$doc

book($b0)

title($t0) author($a)

book($b)

title($t) author($a1)

$doc

original query

$docV

authorlist($p0)

title($t2) author($a3)

authorlist($p)

title($t3) author($a4)

$docV

<bibentry> $a3, B2(R)</bibentry>

$t3

B1(R)

B2(R)

$a3

[$t3]

candidate rewriting

B1(Q)

$a

B2(Q)

[$b], [$t]

REWRITING ALGORITHM

37

Step 3: Equivalence Check

• check that R ≡ Q: containment mappings defined on the tree of query blocks

• and then (optional step) translate back to XQuery:

$docV

authorlist($p0)

title($t2) author($a3)

authorlist($p)

title($t3) author($a4)

$docV

<bibentry> $a3, B2(R)</bibentry>

$t3

B1(R)

B2(R)

$a3

[$t3]

Rewriting R:

for $a3 in distinct-values($docV/authorlist[title]/author)

return <bibentry> { $a3, for $p in $docV/authorlist, $t3 in $p/title where some $a4 in $p/author satisfies $a4 eq $a3 return $p } </bibentry>

REWRITING ALGORITHM

38

Under the Hood

• two types of equality: by value and by node id– mappings must take it into consideration– the groupby clause also

• XQuery results have order. We consider rewritings that:– do not respect order (for DB-centric applications)– respect order (for text-centric applications)

• for rewritings that respect order: look for an ordering of the view access paths that preserves the original query order (details in the paper)

REWRITING ALGORITHM

39

for $x in $doc/book where count( for $a in $x/author where $x/price eq 60 groupby [$a] return $a ) eq count( …) groupby $x return $x

Extensions to NEXT

• Extended NEXT to NEXT+: – extend the pattern based representation to the whole XQuery– functions and other expressions (negation, disjunction,

aggregates etc.) modeled as uninterpreted functions

• Extended the algorithm to use NEXT+: need to identify maximal subparts that are pure NEXT blocks

REWRITING ALGORITHM

40

Extensions to NEXT

• Extended NEXT to NEXT+: – extend the pattern based representation to the whole XQuery– functions and other expressions (negation, disjunction,

aggregates etc.) modeled as uninterpreted functions

• Extended the algorithm to use NEXT+: need to identify maximal subparts that are pure NEXT blocks.

REWRITING ALGORITHM

for $x in $doc/book where count( for $a in $x/author where $x/price eq 60 groupby [$a] return $a ) eq count( …) groupby $x return $x

rewrite blocks inside function arguments, with free variables bound in upper blocks

rewrite outer block, disregarding function calls

41

• The rewriting algorithm is sound• and complete for a large fragment of XQuery (the one

that can be translated into NEXT), without order– Completeness means that if there are any rewritings, we are

guaranteed to find at least one.

• There is no hope for completeness for– ordered rewritings: equivalence is undecidable– expressions beyond NEXT: negation and universal quantification

also lead to undecidability

In these cases, our algorithm is a best effort approach, with guaranteed soundness.

Formal GuaranteesREWRITING ALGORITHM

42

Implementation (considerations)

• completeness guarantees a price to pay:compute mappings between view and query patterns

• in general, NP-complete, but PTIME if the patterns are trees (no equality conditions): based on M. Yanakakis, Algorithms for acyclic database schemes, 1981

• our goal: design an implementation whose running time is polynomial for pure tree patterns and degrades progressively with the number of added joins

REWRITING ALGORITHM

43

Implementation in practice

• when computing the query plan, apply techniques from the Yanakakis algorithm: push projections & selections

• performance degrades with the number of equalities: the problem is NP-complete in the width of the view pattern (see the paper) and in PTIME when no join equalities.

V

query plan (SPJ)

Q

XML instance

compile

evaluate

..…mappings

REWRITING ALGORITHM

compile

44

Outline

• NEXT (NEsted XML Tableaux)• Rewriting Algorithm and Extensions• Experiments• Previous Work• Conclusions

45

Experiments: Design

• The running time of the algorithm increases with:– number of nested levels: mappings are block by block– size of the pattern: # of mapped and target nodes increases– number of views: more patterns to match

• Our experiments measured how the algorithm scales with these parameters.

• We designed a configuration where we generated queries and views of increasing size and nesting depth.

EXPERIMENTS

46

Experiments: Implementation

Queries & views with similar basic patterns, in a vertical chain of blocks:$doc

mk

a c1

$doc

mk

a c2

$doc

mk+1

a c1

$doc

mk+1

a c2

…..

…..basic pattern

$doc

mk

a ci

Irrelevant views don’t matter (can be quickly discarded). We create only relevant views (with mappings into query):– split the query recursively into fragments = views– make them overlap on basic patterns

EXPERIMENTS

block Bk+1

block Bk

47

Experiments: Good Scalability

d = depth (# of nested levels in a query)

b = breadth (# of basic patterns in a block)

EXPERIMENTS

1.25s for d=16, b=16 and 128 views

48

Previous work

• rewriting XPath queries using XPath viewsRewriting XPath Queries Using Materialized ViewsW.Xu et al. VLDB 2005

• rewriting XQuery using XPath viewsA Framework for Using Materialized XPath Views in XML Query ProcessingA. Balmin et al. VLDB 2004

• rewrite an XQuery with only one XQuery view that has to contain the queryACE-XQ: A CachE-aware XQuery Answering SystemL.Chen et al. WebDB 2002

• caching common XQuery subexpressionsImplementing Memoization in a Streaming XQuery ProcessorY.Diao et al. XSym 2004

49

Conclusions

• NEXT is a pattern based representation that describes what the query result is and not how it is computed more opportunities for semantic optimizations

• extensible to all of XQuery, using NEXT+

• rewriting using views algorithm– sound for the whole language– complete for a large fragment of XQuery– good scalability– independent of the underlying algebra of the query processor

50

Online Demo

http://db.ucsd.edu/reform