inbal yahav a framework for using materialized xpath views in xml query processing vldb ‘04 db...
Post on 21-Dec-2015
218 views
TRANSCRIPT
Inbal Yahav
A Framework for Using Materialized XPath Views in XML
Query Processing
VLDB ‘04
DB Seminar, Spring 2005
By: Andrey Balmin
Fatma Ozcan Kevin S. Beyer Roberta J.
CochraneHamid Pirahesh
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
‘Prolog’ Materialized Views Vs. Views
Materialized XPath views Vs. materializes views (relational databases)
?
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
Agenda
Introduction
Materialized XPath Views
XPath Matching Algorithm
Definitions
Description
Examples
Complexity
Compensation Expression
Additional References
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
Introduction
With increase amounts of data, presented and exchanged as XML document, there is a need to efficiently query those documents.
To address this request:
W3C has proposed an XML query language – XQuery (will be discussed later)
ANSI and ISO has defined SQL / XML (Extends relational databases to handle XML)
Both uses Xpath to navigate through the XML document.
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XQuery XML query language
Input: XML document (tree)
Output: XML sub tree
Syntax: FLWOR (For-[Let]-Where-[Order]-Return)
Example:
XML Document XQuery – Q1
Result
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XPath Complexity
The containment problem, as discussed in “Containment and Equivalence for an XPath Fragment” by G. Miklau and D. Suciu (will by reviewed at the 8’th lecture), is shown to be CO-NP complete
Meaning: The problem T’ T is NP complete, and therefore cannot be solved in a polynomial time.
Just for intuition.. :
Consider the query: //A//b
And the following XML document:
Introduction
A
A A
A A
AAA
AA
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
Materialized XPath ViewsThe Goal
In relational databases indexes and materialized views are two well-known techniques to accelerate processing of expensive SQL queries.
We would like to expand the materialized views idea to speed up processing of XQuery or SQL / XML queries.
The System
Suggest a materialized XPath view structure.
Define an algorithm that checks whether a given view can be used to answer a given query. (Match algorithm)
Define a compensation expression that computes the query result using the information available from the view.
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
Materialized XPath ViewsStoring XPath Views
To be able to answer all kinds of queries, the view structure should contain:
Node path (a list of ancestors) Typed values Reference to the node
Note that storing only typed values and node path (without references to the document nodes), will not allow us execute queries whose result is a node collection.For example:
Sometimes it also beneficial to store actual copies of XML fragments in an XPath view.
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
Materialized XPath ViewsExample:
V = //@*
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XPath Matching AlgorithmAfter defining the materialized XPath view structure, we have to
make sure that a given view can be utilized in a user query.
Definitions
We represent XPath expressions as labeled binary trees, called XPS trees. (XPath Step)
An XPS node, represents a step in the XPath statement, is labeled with: Axis {“root”, “child”, “descendant”, “self”, “attribute”, “descendant-or-self”, “parent”} Test {name test, wildcard test (*), kind test (node(), text())}
//employees /* /employee [@salary]
Node 1 2 3 4
Axis: dosTest: employees
Axis: descendentTest: *
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XPath Matching AlgorithmDefinitions – cont’
Each XPS node has two children: The first child is called predicate (and / or, a comparison operator, a constant, XPS node)The second child, called next, points to the next step (XPS node)
If one of the children does not exist, we represent it with null
For example:
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XPath Matching AlgorithmTransition Rules
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XPath Matching AlgorithmGeneral
The algorithm computes all possible mappings from XPS nodes of the view to XPS nodes of the query expression, in a single top-down pass of the view expression.
The basic algorithm deals only with and / or predicates, and child, attribute or descendant axis.
Each function of the table evaluates to Boolean.
Important! The view expression can not be more restrictive than the query.
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XPath Matching AlgorithmmatchStep(v,q)
Rule 1.1: It is sufficient for one of the conjunction to be mapped by a node of v. (For the other conjunction we use the reference)
Rule 1.2: If the view is more restricted than the query, it cannot be used. (For example: Q2)
XPS1(root)
V = //*[@#] Q2 = //order/lineitem[@price and discount]
XPS3(@*)
XPS2(//*)
null
null
XPS4(root)
XPS6(/lineitem)
XPS5(//order)
null
null
nullAND7
XPS9(/discount)XPS8(@price)
Q3 = //lineitem[@price or price]
XPS10(root)
XPS11(//lineitem)null
nullOR12
XPS14(/price)XPS13(@price)
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XPath Matching AlgorithmmatchStep(v,q) – cont’
Rule 1.3: When the view node contains a “descendant” axis, we keep looking for matches down the tree.
Rule 1.4: If the axis matches, we try to match the predicate and the next children of the view node. (axis {predicate, child, attribute})
Rule 1.5: If none matches, the algorithm returns false.
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XPath Matching AlgorithmmatchChildren(v,q)
Rule 2.1: If the tests matches, we try to match the predicate and the next step of v.
matchPred(vpred,q)
Rule 3.1: If v does not have a predicate, then the step trivially matches.
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XPath Matching AlgorithmmatchPred(vpred,q) – cont’
Rule 3.2: If v has a predicate and q does not (meaning: v is more restricted than q), then the match fails.
Rule 3.3 and 3.4: Match both conjuncts is case of conjunction, and one disjunct in case of disjunction. (For example if v contains all the orders with both price and amount attributes, it cannot be used for a query that requires either price or amount).
Rule 3.5: Match both predicates, and the view’s predicate with query’s next child, in case of nested XPath expressions.
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XPath Matching AlgorithmmatchNext(vnext,q)
Rule 4.1 and 4.2: Same as 3.1 and 3.2.
Rule 4.3: Match next children, and the view’s next child with the query’s predicate, in case of nested XPath expressions.
(For example:
v = //a[b/c]
q = //a/b[c] )
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XPath Matching AlgorithmExample
Consider the following hierarchy of employees:
Each employee has: Name, Salary and Bonus attributes Zero or more sub-employees elements
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XPath Matching AlgorithmExample – cont’
Consider the following view: //Employee//@*, witch contains all attributes in a sub-tree of any employee,
And the query: //Employee[@Bonus]/Employee[@bonus]/@Salary, that asks for the salary of employees who, together with their direct managers, have bonuses.
XPS1(root)
V = //Employee//@* Q = //Employee[@Bonus]/Employee[@Bonus]/@Salary
XPS4(@*)
XPS3(desc-or-self::*)
XPS2(//Employee)
null
null
null
XPS11(root)
XPS16(@Salary)
XPS14(/Employee)
XPS12(//Employee)
XPS15(@Bonus)
XPS13(@Bonus)
null
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XPath Matching Algorithm
XPS1(root)
V = //Employee//@* Q = //Employee[@Bonus]/Employee[@Bonus]/@Salary
XPS4(@*)
XPS3(desc-or-self::*)
XPS2(//Employee)
null
null
null
XPS11(root)
XPS16(@Salary)
XPS14(/Employee)
XPS12(//Employee)
XPS15(@Bonus)
XPS13(@Bonus)
null
matchStep(1,11)matchChildren(1,11)matchPred(null,11) ^ matchNext(2,11)
T
matchNext(2,11)
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XPath Matching Algorithm
XPS1(root)
V = //Employee//@* Q = //Employee[@Bonus]/Employee[@Bonus]/@Salary
XPS4(@*)
XPS3(desc-or-self::*)
XPS2(//Employee)
null
null
null
XPS11(root)
XPS16(@Salary)
XPS14(/Employee)
XPS12(//Employee)
XPS15(@Bonus)
XPS13(@Bonus)
null
matchStep(2, null) matchStep(2,12)
F
matchStep(2,12)matchChildren(2,12) matchChildren(2,14)
Later..
matchChildren(2,12)
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XPath Matching Algorithm
XPS1(root)
V = //Employee//@* Q = //Employee[@Bonus]/Employee[@Bonus]/@Salary
XPS4(@*)
XPS3(desc-or-self::*)
XPS2(//Employee)
null
null
null
XPS11(root)
XPS16(@Salary)
XPS14(/Employee)
XPS12(//Employee)
XPS15(@Bonus)
XPS13(@Bonus)
null
matchPred(null,12) ^ matchNext(3,12)
T
matchNext(3,12)matchStep(3,13) matchStep(3,14)
Later..
matchStep(3,14)
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XPath Matching Algorithm
XPS1(root)
V = //Employee//@* Q = //Employee[@Bonus]/Employee[@Bonus]/@Salary
XPS4(@*)
XPS3(desc-or-self::*)
XPS2(//Employee)
null
null
null
XPS11(root)
XPS16(@Salary)
XPS14(/Employee)
XPS12(//Employee)
XPS15(@Bonus)
XPS13(@Bonus)
null
matchChildren(3,14)matchPred(null,14) ^ matchNext(4,14)
T
matchNext(4,14)matchStep(4,15) matchStep(4,16)matchChildren(4,15) matchChildren(4,16)
T
T
T
T
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query ProcessingXPath Matching
AlgorithmmatchStep(XPS1, XPS11)
matchChildren(XPS1, XPS11)
matchNext(XPS2, XPS11)matchPred(null, XPS11) T
matchStep(XPS2, XPS12)matchStep(XPS2, null) F
matchChildren(XPS2, XPS14)matchChildren(XPS2, XPS12)
matchNext(XPS3, XPS12)matchPred(null, XPS12) T
matchStep(XPS3, XPS14)matchStep(XPS3, XPS13)
TF
T
Same way…
T
T
T
T
T
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XPath Matching AlgorithmRecording the Match
Consider the following example: View v consists n nodes: //a//a…//aQuery q consists of m nodes: /a/a…/aWhere m > n.
Any view node nv can map to any query expression node nq such that nv’s parent maps to some ancestor of nq. Hence, there are n out of m distinct tree mapping of the view to the query expression.
XPSv0(root)
XPSv1(//a)
XPSv2(//a)
XPSvn(//a)
null
null
null
XPSq0(root)
XPSq1(/a)
XPSq2(/a)null
null XPSq3(/a)
XPSqm(/a)
......
And so on…
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XPath Matching AlgorithmSolution:
Keeping track of all mapping in a match matrix structure (matches between v and q).
Match matrix allows us to encode an exponential number of tree mappings in a polynomial size structure, by recording all possible contexts for each node mapping.
It also reduces running time of the algorithm to polynomial.
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XPath Matching AlgorithmMatch matrix structure:
v \ q XPS11 root
XPS12 //Employe
e
XPS13 @Bonus
XPS14 /Employe
e
XPS15 @Bonus
XPS16 @Salary
XPS1 root
T
XPS2 //Employe
e T T
XPS3 dos::*
T T
XPS4 @*
T T T
Each row corresponds to an XPS node of the view tree. Each column corresponds to an XPS node of the query tree. Each cell may contain one of three possible values: Empty, True or False. Edges represent the context in which the mapping was detected.
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XPath Matching Algorithm
XPSv0(root)
XPSv1(//a)
XPSv2(//a)
XPSvn(//a)
null
null
null
XPSq0(root)
XPSq1(/a)
XPSq2(/a)null
null XPSq3(/a)
XPSqm(/a)
......
For example, in the previous XPS trees: matchStep(XPSv2, XPSq2) will be called twice, but will be calculated only once.
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
Extensions Handling Comparison Predicates
The algorithm as shown so far, lacks rules to match comparison operations. To solve this problem, the XPS trees are preprocessed in a way that comparison operations become transformed to filters. After a successful matching, for each filter in the view query tree, we have to check if the filter of a matched node in the query tree is at least as specific.
XPS1(//Order)
V = //Order/*[Price>60]
60XPS3(@Price)
XPS2(/*)null
>
XPS1(//Order)
Filter: XPS3>60
XPS3(@Price)
XPS2(/*)null
>
null
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XPath Matching AlgorithmComplexity
Space complexity:
Match matrix size: O(|V| * |Q|)
Number of XPS nodes in the view
Number of XPS nodes in the query
expression
Each matrix cell can have at most |Q| incoming edges The number of edges in the matrix: O(|V|*|Q|2)
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
XPath Matching AlgorithmComplexity – cont’
Constructing the matrix:
matchStep(v,q) function has only |V|*|Q| distinct sets of parameters Each match can be calculated at most once In the worst case a function call may expand into |Q| function calls:
Total running time: O(|V|*|Q|2) Polynomial!
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
Compensation Expression After having a successful match between the query and the view, we will have to extract the query by using the materialized view.
For simplicity we assume that there is only one possible mapping between the view and the query.
This extraction is called compensation, and is achieved in two steps:
Step 1: Eliminate unnecessary conditions. For example: V = //a[@b]Q = //a[@b^@c] The query expression does not have to include the
[@b] condition, since it implies by the view.
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
Compensation Expression Step 2: The query statement is transformed, or relaxed, so
that data can be extracted easily from the view table: We define the last matched node of the view as extraction point, and the last matched node of the query expression as compensation root node. Than, we reconstruct Q to an equivalent expression that starts at the compensation root node.
Now, as in this example, item elements can be directly extracted from the view table.
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
Compensation Expression
Extract from the table
Use reference
Inbal Yahav
A Framework for Using Materialized XPath Views in XML Query Processing
DB Seminar, Spring 2005
Additional References
XQuery and SQL / XML: http://www.datadirect.com
XML Seminar: http://www.jgreen.de
General: http://msdn.microsoft.com
Inbal Yahav
The End
DB Seminar, Spring 2005